Working with tags

drdileepunni · April 25, 2020, 6:28am

Hi,

Purchased prodigy day before yesterday and already a big fan. Thank you so much for the product and kudos to the team.

I am trying to come up with an optimal strategy for my NER annotation and training. I need 4-6 named entities to be recognized from the text that is being parsed. But couple of these entities are conceptually very similar and I am unsure if the model would be able to tease out the difference. So my questions here is,

Is it better to do all the tags in one go and train, what I am worried about with this strategy is that since is the evaluation is made by accuracy, the low accuracy is because of the model mis identifying the similar tags. Is there any provision in prodigy to merge two tags in the annotated data? (eg: merge the tags "black dogs" and "white dogs" to a single tag "dogs").
If I am doing multiple tags in one go, while training, is there any way to check the accuracy tag wise? (eg: what is the accuracy is identifying the tags "animals" vs the tags "plants")

Thank you very much.
DU

ines · April 27, 2020, 9:09am

Thanks, that's nice to hear And those are both good questions. I think your reasoning here makes sense and if you're worried that some of your categories may be too specific, a good solution is to just try it, train with both label schemes and see what works best.

That should just be a simple data transformation – nothing about the underlying data will change, just the label. For each span you annotate, Prodigy will add an entry to the "spans" that looks like this: {"start": 10, "end": 20, "label": "BLACK_DOGS"}. So you can just export your data as a JSON file (or load it in Python, whichever you prefer) and replace all dog-related labels with just DOG. Then you can either import it to a new dataset, or use it in a different process.

If you run the train recipe, Prodigy will report the overall accuracy, as well as the accuracy per label. You can see an example of this here. In spaCy, the per-label accuracy is available in the ents_per_type attribute of the Scorer, which is returned by nlp.evaluate.

Topic		Replies	Views
Training Multiple entities at the Same time? ner , spacy , solved	11	3185	November 27, 2018
combining multiple models and exporting training data to spacy ner , spacy	3	2891	November 13, 2018
Best practice for merging multiple NER datasets into one . usage , ner	1	788	November 30, 2021
NER overlapping datasets, meaning of lack of annotation usage , ner , best-practices	1	1195	April 25, 2019
Merging single label-based models into one multiple label-model usage , ner , solved	3	1086	June 10, 2020

Working with tags

Related topics