Presenting the same annotation task multiple times

wpm · January 4, 2018, 2:50pm

I do a session of ner.teach, save my results, and stop the session. Then I start another ner.teach session on the same dataset, and Prodigy presents me with annotation tasks that I already did in the first session. Is this expected behavior?

ines · January 4, 2018, 3:04pm

As the default behaviour, yes. Prodigy tries to make as little assumptions as possible about your input stream, the dataset you’re using and annotations you’ve already collected. You won’t be asked the same question twice in the same session, but new sessions will start off fresh and with no preexisting state by default.

However, once the fix to this bug is pushed, you’ll be able to specify the --exclude argument on the ner.teach command and others to exclude annotations that are already present in one or more datasets. The task hash (based on the input data and features you annotate, e.g. spans or labels) is used to determine if a question has been asked before. So you could set it to exclude the current dataset to not ask about things you’ve already annotated in that set. You can also use it to make sure that evaluation examples don’t end up in your training set, and vice versa.

nix411 · April 11, 2020, 8:41pm

I just came across this one.

Isn't this different now? By default it'll exclude those with the same task hash if starting on the same dataset again, right?

ines · April 12, 2020, 10:38am

Yes, by default, the "auto_exclude_current" setting is true and tasks from the current dataset are excluded. This was introduced in v1.6.0, released in October 2018.

As of v.1.9.0, recipes can also return an "exclude_by" setting as part of their "config" to specify whether to exclude by input hash or by task hash.

Topic		Replies	Views
Exclude not functioning / duplicate tasks done , streams	6	1694	July 21, 2020
Restarting Prodigy with a new session usage , solved	9	1998	October 28, 2022
Enable to annotate same input text twice usage , solved , streams	7	435	August 31, 2021
Restarting prodigy on same dataset doesn't skip completed tasks (custom recipe)	3	357	October 5, 2022
Multi-user sessions and excluding annotations by session enhancement , usage , streams	7	1679	December 25, 2019

Presenting the same annotation task multiple times

Related topics