Name Entity Recognition Workflow

HI All,

I have been using Prodigy for a few days now and want to start off by saying how amazing Prodigy is. It is one of the coolest tools I have used in a long time.

At the moment I am using Prodigy for Named Entity Recognition. I wanted to clarify that my workflow is correct. I started by training manually with ner.manual. After I had a 200 or so annotations I used train to train the model. Then I used the ner.teach and repeated the cycle a few times until I had about 500+ annotations. Now that I have a base model built I am using ner.correct. Is there any reason at this point to go back to ner.manual / ner.teach or is using ner.correct going to be the most useful?

Additionally what is the use case for ner.silver-to-gold?

Thank you!

Hi, thanks so much – that's nice to hear :smiley:

ner.teach and ner.correct are essentially two different approaches for using an existing model and its predictions to create training data. ner.correct does this in a pretty straightforward by letting you edit/correct existing predictions, while ner.teach goes a bit deeper and tries to find the most relevant examples to improve a model, based on different possible analyses. You'd probably want to use ner.teach at a later stage when you already have a decent model and want to improve its predictions.

What works best depends on your specific problem – if you're happy using ner.correct, it's not too tedious and the results look good, I'd say stick to that :slightly_smiling_face: And at the end of it, you get complete gold-standard annotations, which is always good to have.

Once you have a model and a label scheme that you're happy with and you're mostly interested in improving it, there's not really a good reason why you'd want to go back to fully manual annotation. (Unless you want to run unbiased experiments to determine whether humans and the model make similar and consistent decisions – but this would be a completely separate experiment.)

ner.silver-to-gold lets you convert "silver-standard annotations", i.e. binary yes/no annotations created with ner.teach to "gold-standard annotations" – so basically, one (more or less complete) annotated example per text, instead of multiple accepted or rejected decisions about individual spans.

@ines Thank you for the response. Really appreciate the time you take to answer each question with in-depth answers.

1 Like