Exclude not functioning / duplicate tasks

ines · July 2, 2020, 2:36pm

Hey, thanks a lot for the super detailed report!

Before looking into this in more detail, I can definitely confirm what the intended behaviour is: ner.teach excludes by task hash, so two examples with the same task hash are considered duplicates and you should never be asked about them twice. The task hash is based on the text and span/label so you may be asked about different suggestions on the same text, but never about the same text + span + label combination. If an incoming example has the same task hash as an example in one of the excluded sets (via --exclude or the current dataset) and it's presented to you, that's a bug.

Other recipes, mostly the manual ones like ner.manual, exclude by input (via the "exclude_by": "input" config setting) because the assumption here is that you want to create one gold-standard annotation for each text and don't want to see the same text again, even if it comes in later with different pre-highlighted suggestions.

This thread made me a little suspicious about the --exclude option with a separate dataset. Although nothing really changed around this, so I'm not entirely sure where the problem would be But it's probably the first thing we should double-check.

db-merge currently only appends and doesn't do any hashing/filtering/combining (that's currently only done during training and when you run data-to-spacy). So if you have 4 examples + 4 duplicates, you'll end up with one set of 8 examples.

Topic		Replies	Views
ner.correct --exclude not excluding duplicate tasks bug , ner	17	1827	December 7, 2021
Presenting the same annotation task multiple times ner , solved	3	948	April 12, 2020
--exclude is not working for ner.make-gold on same dataset enhancement , usage , ner	3	1173	March 21, 2019
ner.teach does not exclude dataset even after '--exclude' usage , ner	4	584	February 6, 2019
--exclude in textcat teach is not working as expected. textcat , more-info-needed	2	397	December 15, 2020

Exclude not functioning / duplicate tasks

Related topics