How to score incompletely highlighted entities?

ines · June 20, 2018, 8:09am

If you’re using an active learning-powered recipe, those entities are both examples of suggestions you should reject. By rejecting incorrect boundaries, you’re essentially telling the model “Nope, try again!”, moving it towards the correct boundaries. Each token can only be part of one entity, so if you accepted a partial match like “Hong”, the feedback the model would get from this is “Yep, in contexts like this, ‘Hong’ is a single-token GPE entity and wins over all other possible analyses containing this token!” That’s obviously not what you want.

If you’re labeling manually (e.g. using ner.manual or ner.make-gold), your focus would be slightly different: The dataset you produce and use for training later on should reflect the gold-standard analysis with required labels and no missing or unknown values. It’s totally fine to do this in several steps btw – in fact, we usually recommend focusing on a smaller label set when you label manually and make several passes over the data if necessary.

Prodigy is able to train from both types of annotations: accept/reject feedback on single entities where the entity labels of the rest of the text are unknown, and gold-standard annotations that describe the complete text and all available entities (or the fact that the text has no entities). The --no_missing flag on ner.batch-train lets you tell Prodigy that no entities are missing, and that your data should be treated as gold standard.

Topic		Replies	Views
ner.teach annotations with incomplete n-grams usage , ner , solved	5	900	January 30, 2018
Annotation strategy for gold-standard data usage , ner , solved , best-practices	5	2606	October 26, 2018
Fixing NER Spans usage , ner , solved	4	638	May 7, 2018
Annotating correctly using the ner.correct recipe usage , ner , solved	5	429	January 20, 2022
only some entities in task recognized in ner.teach usage , ner , solved	7	438	November 30, 2020

How to score incompletely highlighted entities?

Related Topics