Text Classification, queue contains already labeled annotations when session is closed/reopened

Hi there,

Apologies in advance if this issue has already been dealt with (I wasn’t able to find a post with my exact case). When I close/reopen a session using the following command, my interface displays annotations that have already been classified:

prodigy textcat.teach voyages fr_core_news_md EStatutEgo_sample_15000_5clean.jsonl --label voyage --seeds voyages.txt

Is this a bug? Is there a way to get fresh, unclassified texts to annotate without having to re-annotate the texts from my previous session(s)?

Thanks in advance!

By default, Prodigy tries to make as little assumptions as possible about your stream and the dataset you’re saving your annotations to. But you can tell it to explicitly exclude all questions you’ve already annotated in any dataset using the --exclude option:

prodigy textcat.teach voyages [other stuff here] --exclude voyages

You can define one or more comma-separated datasets to exclude – for example, the current dataset, but also any other sets you’ve created before. (If you’re interested in how this works under the hood and how Prodigy decides whether a task was annotated before, you can find more details in the set_hashes section in the docs.)

Worked like a charm! Might poke you for other questions in the near future. Thanks!

Yay, glad it worked! And sure – that’s what the forum is for :blush: