How to treat entity-free text in manual/match modes.

edward · April 16, 2019, 1:48pm

If I see text with no entities at all in ner.manual or match modes, should I:

A. Accept, leaving no text highlighted.
B. Reject, leaving no text highlighted.
C. Ignore, leaving no text highlighted.

Plus, what’s the difference between doing A, B or C? Generally, the controls (while great for things like ner.teach) don’t seem intuitive/well-suited to these workflows - unless I’m missing something!

ines · April 16, 2019, 3:42pm

This is a good question and actually a very important one. TL;DR answer: You should pretty much always accept examples with no entities if the text doesn’t actually contain any entities.

Training your model with examples of entities and examples of what’s not an entity / texts without any entities is very important – you don’t want it to overfit and “hallucinate” entities because it’s never seen a single example without entity annotations during training.

Accepting an example will include it in the training data. Ignoring an example will always exclude it from the training and evaluation data – so you should only really do that for examples that are to difficult to answer, weird, broken etc. How rejected examples are handled depends on how you’re training the model later on. If you’re training it from binary accept/reject examples, the accepted and rejected examples will help construct the best possible analysis given your feedback. You can see some examples of that in my slides here. This means that the model can be updated accordingly, even if you haven’t collected annotations about every single token. If you’re training from manual annotations and set the --no-missing flag, spaCy will assume that the data is “gold standard” and that the annotated entities are the only entities present in the data and all other tokens are O (outside an entity). So only the accepted answers will be used and they’ll be treated as the perfect and “final” analysis. If an accepted text has no entities highlighted, this will be interpreted as “this text has no entities”.

Topic		Replies	Views
Should I accept a piece with no labeled entities? ner , solved	1	407	December 12, 2019
Ignore or reject in text with many entities usage , ner , solved	2	1577	July 30, 2018
Manual Annotation Response for Text Without Entities usage , ner , solved	6	1006	March 16, 2018
Untagging and "accept" versus "reject" usage , ner	2	754	June 17, 2021
NER not containing <word_list> usage , ner , spacy	11	1248	September 9, 2019

How to treat entity-free text in manual/match modes.

Related topics