Hello! I am annotating data through NER.correct, NER.review, and NER.teach for a training set and typically use "skip/ignore" for samples that I would like to return to later or that might be overly time consuming to label.
I use "reject" for samples that break our annotation guide/labeling scheme, but am wondering: would it have the same effect on the model to untag all tokens and click "accept" instead of pressing "reject"? I'm wondering how the model learns differently from each of these actions.
No, this would end up with different implications, so it's an important distinction to make: If you hit accept for an example, this example will always be included in your training data by default (e.g. when you run data-to-spacy or train). So if you remove all entities and accept, your model will be updated with this example and the information that it contains no entities. This is different from excluding the example entirely and not updating your model with it at all. If an example maybe includes entities but you don't really know (e.g. because you can't decide on the label or boundaries), you wouldn't want to update your model with the information "no entities".
(In general, it's just as important to include examples of what's not an entity, as well as examples with no entities at all if those occur in your data. Otherwise, you'll end up with a mismatch of entity distributions between the training and runtime data, and potentially much worse runtime results, like hallucinated entities. But the examples you use here should be ones where you know that there are no entities.)
P.S. Just a quick note for future reference. In the upcoming version of Prodigy (and specifically, in spaCy v3.1), we'll have better support for "negative" examples, i.e. annotation of spans you know are not a given entity, even if you don't know the full correct answer. Prodigy will then treat "reject" answers as negative examples, so if the example annotates a span as PERSON and you reject it, the feedback the model gets is "we don't know the full answer for this example but this particular span is not a PERSON".
Hi Ines! Thank you so much for your thorough reply and for all your support in these forums. This is helpful and makes sense. Great to hear about further support for "negative" samples.
I've also taken a look at the following thread, which has been helpful about when to reject samples:
Could you please elaborate on the difference between IGNORE and REJECT in the current version?
Specifically, if we are using the Review recipe, what is the difference between reject and ignore? And if we are using the Teach recipe, is there a difference between reject and ignore?