Saved annotation not excluded in active learning recipe

phainom · February 4, 2022, 5:04pm

Hi everyone,

thanks a lot for your work on Prodigy!

I am currently working on a text categorization task with Prodigy and built my own custom recipe based on the example on Github.

However, similar as in Resuming annotations after closing the terminal, saved annotations are not excluded from labelling after the server is closed and restarted. I tried to adjust the setting in prodigy.json, but it didn't help. Note that this is only an issue when using the active learning component (even with the original custom model dummy example linked above), but not with textcat.manual. In the manual case everything works as expected. I tried to add an exclude option to the recipe, but even with the explicit exclusion of the dataset the examples are requeued for labelling (even in a dummy 1 row setup).

Do you know what could be the issue?

Thanks a lot for your help!
Matthaeus

ines · February 7, 2022, 12:22pm

Hi! Could you check if the hashes, specifically the _input_hash and _task_hash of the saved examples are the same or different from the new incoming examples?

By default, the active learning recipes that collect binary annotations will exclude based on the _task_hash, so you can annotate different questions about the same text (e.g. text + different labels). So it's possible that you may see suggestions about the same text again, but with different labels. If you don't want this, you should be able to set "exclude_by": "input" in the config. However, this can also become problematic for future examples because in that case, you'd only be seeing an example once and the model isn't able to ask about different suggested labels and your data ends up more sparse.

phainom · February 7, 2022, 4:02pm

Thanks for getting back to me. Yes, the input hashes are the same. I only have a single label in the data. I also tried "exclude_by": "input" and it doesn't work, it still keeps queueing the same example to me, even with the example recipe from github.

ines · February 13, 2022, 10:02am

Thanks for the update, that's definitely interesting that it's only occuring in the active learning recipes and not in the other ones We'll investigate!

Topic		Replies	Views
Text Classification, queue contains already labeled annotations when session is closed/reopened textcat , solved	3	481	June 6, 2018
--exclude in textcat teach is not working as expected. textcat , more-info-needed	2	397	December 15, 2020
textcat.teach: how to exclude target dataset examples by hash, but auxiliary datasets by input? usage , textcat , best-practices	1	502	August 23, 2022
Resume Annotation Session with Prodigy - Text Classification textcat	1	1642	June 14, 2018
Label not assigned when using mark recipe done	2	513	October 10, 2017

Saved annotation not excluded in active learning recipe

Related topics