Only one entity per example in evaluation dataset

Hi I got a question

I collected annotation data with ner.teach and updated the en_core_web_lg model with my new annotation data. Now I evaluated the dataset using the scorer and I got really bad precision and recall. I think I know the problem but you might can confirm that I missed something:

When I am asked by the ner.teach binary "questions" there is always only one entity annotated and therefore all the examples in the evaluation.jsonl file contain only one entity and do not contain the other entities which are probably also in the text but not "asked" by the binary ner.teach question. Right?

How do I get an evaluation dataset containing all the entities? Do I have to create ner.make-gold data anyway if I work with the ner.teach to evaluate my dataset? If yes, does the scorer know which data to use (binary annotation dataset from ner.teach or the data from ner.make-gold) or do I have to create a new prodigy dataset and do the evaluation with the new one?

Thank you for your help!

Hi,

I think you've hit on the correct issue here. The ner.teach recipe gives you binary questions, so the model has to guess about the entities that aren't annotated. It doesn't know whether some unannotated question is correct or incorrect.

Now, this still is enough information from the questions to correct some errors, if the model is already quite good at the entity types you're working on. So it's a way to go from say, 85% to 90% with less annotation. But if you're at a low accuracy (because you're starting a new entity type), the ner.teach recipe isn't so helpful.

For a new entity type, you should probably start with the ner.manual recipe, to just start annotating in a fairly simple way. Then when you have enough data, you can use ner.batch-train, specifying the --no-missing flag. That flag tells the model that the annotations are complete, i.e. there's no missing entities. This way during training, if the model predicts some incorrect entity, the loss can be calculated to penalise it.

Let me know if it's still not clear, but I think from the sound of it your thinking is definitely on the right track :slight_smile: