Only one entity per example in evaluation dataset

honnibal · September 19, 2019, 10:33pm

Hi,

I think you've hit on the correct issue here. The ner.teach recipe gives you binary questions, so the model has to guess about the entities that aren't annotated. It doesn't know whether some unannotated question is correct or incorrect.

Now, this still is enough information from the questions to correct some errors, if the model is already quite good at the entity types you're working on. So it's a way to go from say, 85% to 90% with less annotation. But if you're at a low accuracy (because you're starting a new entity type), the ner.teach recipe isn't so helpful.

For a new entity type, you should probably start with the ner.manual recipe, to just start annotating in a fairly simple way. Then when you have enough data, you can use ner.batch-train, specifying the --no-missing flag. That flag tells the model that the annotations are complete, i.e. there's no missing entities. This way during training, if the model predicts some incorrect entity, the loss can be calculated to penalise it.

Let me know if it's still not clear, but I think from the sound of it your thinking is definitely on the right track

Topic		Replies	Views
ner.teach - couple of questions ner , done , solved , nightly	9	2651	December 30, 2021
Debugging NER - batch_train with custom dataset ner	5	589	October 16, 2019
"Gold Standard" dataset as evaluation for ner.batch-train with binary annotation? usage , ner	2	788	May 15, 2019
Check if ner.teach has caught all entities with ner.manual usage , ner	1	435	August 16, 2019
Two Questions on Teach recipes usage , ner , textcat , solved	5	744	January 27, 2020

Only one entity per example in evaluation dataset

Related topics