I do a session of ner.teach
, save my results, and stop the session. Then I start another ner.teach
session on the same dataset, and Prodigy presents me with annotation tasks that I already did in the first session. Is this expected behavior?
As the default behaviour, yes. Prodigy tries to make as little assumptions as possible about your input stream, the dataset you’re using and annotations you’ve already collected. You won’t be asked the same question twice in the same session, but new sessions will start off fresh and with no preexisting state by default.
However, once the fix to this bug is pushed, you’ll be able to specify the --exclude
argument on the ner.teach
command and others to exclude annotations that are already present in one or more datasets. The task hash (based on the input data and features you annotate, e.g. spans or labels) is used to determine if a question has been asked before. So you could set it to exclude the current dataset to not ask about things you’ve already annotated in that set. You can also use it to make sure that evaluation examples don’t end up in your training set, and vice versa.
I just came across this one.
Isn't this different now? By default it'll exclude those with the same task hash if starting on the same dataset again, right?
Yes, by default, the "auto_exclude_current"
setting is true
and tasks from the current dataset are excluded. This was introduced in v1.6.0, released in October 2018.
As of v.1.9.0, recipes can also return an "exclude_by"
setting as part of their "config"
to specify whether to exclude by input hash or by task hash.