ner.teach one label at a time

Onyoursix · August 25, 2021, 7:40pm

This might be a really dumb question. I'm going to be working on a really large data set with quite a few labels. It's easier to focus on one label at a time vs 20 and potentially missing something. Is there any downside to only annotating one label at a time in a data set vs all at once? I can't think of anything off the top of my head that doing it this way would cause an issue but just wanted to double check my bases before I get too far down.

ines · August 29, 2021, 2:53am

Hi! This is a totally reasonable question When you train a model from binary (or any annotations, really), Prodigy will merge all annotations on the same input, so you only end up with one training example per text. This means you can collect your annotations in a more fine-grained way (one dataset per label) and merge them automatically at the end.

We often recommend focusing on a smaller subset of labels, or even one label at a time. This makes it easier to focus because you don't have to keep jumping between different labels and only have to think about one concept at a time. It also helps you work against imbalanced distributions during the example selection: if you annotate all labels together and let the model select what to annotate based on the scores, you might only get to see one specific label once or twice, which is unideal.

Btw, more generally, I'd also recommend collecting at least some gold-standard annotations using a workflow like ner.correct or ner.manual, especially if your goal is to train a model (more or less) from scratch.

Onyoursix · August 30, 2021, 5:49pm

Thanks for the reply Ines, I'm essentially just using ner.teach to create a base model that I will use to assist in creating a new gold-standard data set using ner.correct.

That's a really good point about imbalanced distributions, that didn't even cross my mind.

Thanks again!

Topic		Replies	Views
Best strategy for training an NER engine usage , ner	8	2177	December 27, 2017
Merging single label-based models into one multiple label-model usage , ner , solved	3	1078	June 10, 2020
Improve a NER on multiple labels usage , ner	3	1329	March 20, 2019
Correct procedure for ner.teach usage , ner , spacy	7	572	May 25, 2022
How to merge data from ner.correct and ner.teach? usage , ner , database	1	691	November 9, 2020

ner.teach one label at a time

Related topics