Training Multiple entities at the Same time?


(Abhinandan Srivastava) #1

Can we train multiple entities at the same time. So that model can catch different entities ?

(Ines Montani) #2

Sure! During annotation, you can always speficy one or more labels via the --label argument – for example, --label PERSON,ORG. We usually recommend focusing on a smaller label set per session and running smaller experiments during development – but once you’re ready to train your final model for production, you should ideally merge your data and include all the labels you’ve annotated, so the model can learn them all at the same time.

(Abhinandan Srivastava) #3


can you help me with, “how to merge all our data into a single datasets?”
Is there any efficient way?


(Ines Montani) #4

If you want to stay in Prodigy and train from within Prodigy, you could export all the data you want to use using db-out and then import it all into a new dataset using db-in. Or, more efficiently, you could just write a script. This also lets you add data from other sources, if needed.

from prodigy.components.db import connect

db = connect()
examples1 = db.get_dataset('dataset_one')
examples2 = db.get_dataset('dataset_two')
db.add_examples(examples1 + examples2, datasets=['new_dataset'])

If you want to train with a different library or just with spaCy directly, you can do something similar – get all the examples from the datasets you want to use, format them however you need and write them to a file. Each example also has an _input_hash that describes the original input text the annotation was collected on. So examples with the same input hash are annotations on the same text. So if you want to merge the annotations manually, you can find examples with the same input and then merge the "spans" (for NER).

(Abhinandan Srivastava) #5


we added 7 entities in a dataset the way you suggested & while doing review label tagging only few entities are coming up. The initial tagged entity are not coming. what could be the reason for it?

(Ines Montani) #6

Did you include examples of those entities in your training data as well? If you update an existing model with new categories, it’s important to also “remind” the model of what it previously got right. You can do this by processing text with the existing model you want to update, selecting the entity spans you want to “keep” and including those in your training data when you update the model.

You can find more details and solutions if you search for “catastrophic forgetting”:

(Abhinandan Srivastava) #7

Thanks for the speedy reply.

we are still at review level tagging on prodigy.

Initially we just did the skincare entity on a review dataset, now we added 5 more entities (merged entity dataset). The skincare entity is not reflecting on the review dataset. Does it comes under same problem?