I am currently working on a text categorization task with Prodigy and built my own custom recipe based on the example on Github.
However, similar as in Resuming annotations after closing the terminal, saved annotations are not excluded from labelling after the server is closed and restarted. I tried to adjust the setting in prodigy.json, but it didn't help. Note that this is only an issue when using the active learning component (even with the original custom model dummy example linked above), but not with textcat.manual. In the manual case everything works as expected. I tried to add an exclude option to the recipe, but even with the explicit exclusion of the dataset the examples are requeued for labelling (even in a dummy 1 row setup).
Hi! Could you check if the hashes, specifically the _input_hash and _task_hash of the saved examples are the same or different from the new incoming examples?
By default, the active learning recipes that collect binary annotations will exclude based on the _task_hash, so you can annotate different questions about the same text (e.g. text + different labels). So it's possible that you may see suggestions about the same text again, but with different labels. If you don't want this, you should be able to set "exclude_by": "input" in the config. However, this can also become problematic for future examples because in that case, you'd only be seeing an example once and the model isn't able to ask about different suggested labels and your data ends up more sparse.
Thanks for getting back to me. Yes, the input hashes are the same. I only have a single label in the data. I also tried "exclude_by": "input" and it doesn't work, it still keeps queueing the same example to me, even with the example recipe from github.
Thanks for the update, that's definitely interesting that it's only occuring in the active learning recipes and not in the other ones We'll investigate!