make-gold workflow


I am a bit confused by the ner.make-gold workflow. If the model gives me a suggestion where the labels are incorrect, am I supposed to correct the suggestions and then click accept or should I just reject it?

The goal of the ner.make-gold workflow is to produce gold-standard data – i.e. annotations that are complete and “perfect”. In ner.teach, you just give the model binary feedback on different analyses of the text – but in ner.make-gold, the idea is that you correct the entities until the example is complete and all entities are labelled, and then accept it. If you come across a sentence that includes no entities, you would simply accept the unlabelled sentence.

I normally use the “reject” action to explicitly mark examples that are wrong for other reasons – for example, if the tokenization is bad or if it includes bad markup etc.

(Btw, you could also create gold-standard data by hand using ner.manual, but correcting the model’s predictions is often faster, because there’s always a chance that the model gets at least some of the entities right.)