Text Classification, queue contains already labeled annotations when session is closed/reopened

paige · June 5, 2018, 3:10pm

Hi there,

Apologies in advance if this issue has already been dealt with (I wasn’t able to find a post with my exact case). When I close/reopen a session using the following command, my interface displays annotations that have already been classified:

prodigy textcat.teach voyages fr_core_news_md EStatutEgo_sample_15000_5clean.jsonl --label voyage --seeds voyages.txt

Is this a bug? Is there a way to get fresh, unclassified texts to annotate without having to re-annotate the texts from my previous session(s)?

Thanks in advance!

ines · June 5, 2018, 3:17pm

By default, Prodigy tries to make as little assumptions as possible about your stream and the dataset you’re saving your annotations to. But you can tell it to explicitly exclude all questions you’ve already annotated in any dataset using the --exclude option:

prodigy textcat.teach voyages [other stuff here] --exclude voyages

You can define one or more comma-separated datasets to exclude – for example, the current dataset, but also any other sets you’ve created before. (If you’re interested in how this works under the hood and how Prodigy decides whether a task was annotated before, you can find more details in the set_hashes section in the docs.)

paige · June 6, 2018, 11:03am

Worked like a charm! Might poke you for other questions in the near future. Thanks!

ines · June 6, 2018, 11:05am

Yay, glad it worked! And sure – that’s what the forum is for

Topic		Replies	Views
Resume Annotation Session with Prodigy - Text Classification textcat	1	1645	June 14, 2018
Textcat - same data keeps appearing usage , textcat	3	532	July 23, 2019
--exclude in textcat teach is not working as expected. textcat , more-info-needed	2	420	December 15, 2020
Resuming annotation with a model in the loop usage , solved	2	1323	March 6, 2018
Resuming annotations after closing the terminal usage , done , streams	4	640	November 11, 2020

Text Classification, queue contains already labeled annotations when session is closed/reopened

Related topics