Progress bar & Score

Hi,

I have two queries regarding Prodigy tool. Mind the text in the snapshot, this is just for reference purpose.

  1. What does the score below the text signifies? (marked with red below the text in the image)

  2. How do i get the stats (Accept, Reject, Ignore) of the dataset on the interface. Similar to the one given in the insults classifier video.

Thanks

This is the confidence of the prediction assigned by the model – for example, the category label predicted by the text classifier, or the entity label predicted by the entity recognizer. In your example, the score of 0.5 is kind of the "perfect" uncertain prediction – by default, Prodigy prioritises examples with a prediction closest to 0.5, i.e. the ones it's most uncertain about and which will give you the most relevant gradient for training. (No matter if you click accept or reject – the model will always have a gradient of 0.5 to learn from.)

Just set "show_stats": true in your prodigy.json – see here for more details! :blush:

Btw, some more background on the progress bar (also in case others come across this issue later): Prodigy's active learning recipes will use the loss returned by the model's update method to calculate an estimated annotation progress, based on how the model is improving. It does a simple regression to predict how many examples until the loss reaches zero.

Recipes that don't use a model in the loop will check whether the stream has a length and calculates the progress based on the total examples and the examples already annotated in this session. This is usually not the case if the stream is a generator, so the progress bar will show the infinity symbol (like in the screenshot above). In your custom recipes, you can also define your own progress function as the 'progress' component returned by the recipe, for example:

def get_progress(session=0, total=0, loss=0):
    progress = compute_something_here()
    return progress
2 Likes

Hi, thanks for the answer.

I got the progress bar query correctly. Thanks for the detailed answer.

Just to clarify on the above part, if the score is 0.01 or quite less (between 0.0 to 0.1), it means that the model is certain about these sentences/entities? and vice versa for higher values (i.e more than 0.6)

P:S I encountered these values in the real dataset.

Thanks

A score of 0.01 means that the model has assigned a very low probability to the suggested annotation (or, phrased differently, is very confident that it's wrong). Vice versa, higher scores signal higher probability.

That’s great.

Thanks a lot :smiley:

1 Like

How can I update progress for the "textcat.manual" recipe so that instead of infinity symbol a number of answered out of total tasks will appear? Thank you.

@Yuri You could edit the recipe function (or wrap it – see the README for an example) and overwrite the stream with a list instead of a generator. So bascially, stream = list(stream). This means that the stream has a __len__, and Prodigy will be able to calculate the progress based on the total number of examples.

Hi, I have a doubt regarding the progress bar.
As you stated earlier that it updates the model and calculates the loss every time the "update" function is called. But sometimes even if I ignore some garbage sentences and mark them as ignore then also the progress bar updates its value.(In such scenario, I am not accepting or rejecting any sentence in between, I am ignoring all of them, and still the progress bar updates)
I am not sure what happens in this case. My only concern is that it should not learn anything from ignored sentences.

Hi! Are you sure the batch of updates sent back to the server only includes the ignored examples? The progress bar requires an update from the server so it's updated whenever a batch of answers is sent to the server and after the model is updated and reports a new loss. So it's possible that the progress updates after you ignore a bunch of texts and that this progress is based on previously accepted/rejected examples.