I'm not sure if this is intended behavior -- when I use a sorter in my model-in-the-loop annotation, I would get the same data samples different times, but with different predictions. I tried to filter out duplicates, but then it seemed to be only predicting one class. Also, I'm using prodigy nightly.
Hi! This is kind of expected or rather, what the sorter selects depends on the data you feed in. Typically, that's (score, example)
tuples of all possible analyses and predictions for the given example, so you may have multiple versions of the same example with different scored predictions. The sorter will then prioritise the predictions with the highest/lowest/most uncertain score.
If you're collecting binary annotations, filtering duplicates and only allowing one version of an example doesn't make that much sense, because then you're only ever giving feedback on one single prediction out of many (e.g. the first label it predicts).
thank you. I have five classes in my annotation.
I only want to see one prediction (the most likely one) for each example. any way to do that?
In that case, using workflows like ner.teach
or textcat.teach
is probably overkill. If you just want to see the predictions, you can run ner.correct
for named entities. If you have a pretrained text classification model, you can use a recipe like this and pre-populate the "accept": []
list of the task (accepted options) based on the predicted doc.cats
. However, keep in mind that the text classifier will give you scores for all labels, and you have to decide the threshold to consider "most likely" – for example, 0.5
or 0.75
.
You do need some sort of pretrained model, though – if you're starting from scratch, your model may not be predicting anything useful so even if you're updating it in the loop, it might take quite a while to get over the "cold start problem" if you're just going through your examples in order.