Annotation strategy for gold-standard data

I also went through this and notice that my model was getting better by accepting text without entities instead of rejecting it. Happy to read this it kinds of make it official and not only experimental.

What I did to make the training of my NER Gold Faster was to first do a normal train for the new entities with some examples, then do a batch train and export the model. Then I used that model to annotate the GOLD.

I did few 100 examples of GOLD, exported it, used it to train a model with spacy batch train example, by loading the model I used to do the GOLD. Doing this I improved the model, and I did again some 100 examples of GOLD with the newly created model. The suggestion are then getting better and better and you have to do few corrections. I did this a few time.

At some point the iterated model gets pretty good, and it is quick to do 500 or more examples.

4 Likes