Annotation strategy for gold-standard data

honnibal · June 22, 2018, 8:27am

Hi Frederik,

The --label k1,k2 argument tells Prodigy to only suggest entities that have been assigned those labels. These labels are not in the de_core_news_sm pre-trained model you're using, and the make-gold recipe doesn't update the model. This means no entities will ever be suggested by the model.

We should add a warning (or possibly error) if you specify labels not in the model during make-gold. We have similar warnings for most other recipes, as it's an easy mistake to make, especially by mistyping the label name.

The simple answer for Case 3: Mark accept.

The workflow in ner.make-gold should be that you mark all and only the correct annotations, and then mark it as ACCEPT once it's correct. You can use REJECT to mark deeper problems for you to resolve later. For instance:

Sometimes the tokenization is incorrect, preventing you from marking the entity boundaries correctly;
Sometimes you don't have a correct category to put the entity in, so you'd like to revisit the example once you've updated your label scheme.
Sometimes the entity contains other entities within it, and you'd like to note that in your downstream evaluation.

If there are no problems like this, it's often the case that the correct analysis has no entities. These examples are important for the model to learn from, so you need them in your training data.

Topic		Replies	Views
Fixing NER Spans usage , ner , solved	4	662	May 7, 2018
make-gold workflow usage , ner	1	627	June 11, 2018
Help with messy data usage , ner	8	666	January 20, 2019
When to reject in ner.manual or ner.make-gold? usage , ner , solved	1	1291	October 17, 2018
Annotating correctly using the ner.correct recipe usage , ner , solved	5	458	January 20, 2022

Annotation strategy for gold-standard data

Related topics