Data annotation : Query Regarding Data Annotation and Merging in Prodigy

Manoj · January 10, 2025, 10:38am

Hi Team,

I am Manoj Goyal.

I have a query related to data annotation in Prodigy.
I have three datasets named ner_resume_person, ner_resume_org, and ner_resume_course. I want to merge these datasets into one. To do this, I used the db-merge command to combine them. After merging, I obtained a single merged dataset and trained the spaCy model. However, the output is not correct — the model does not capture the course data from the dataset.

I had raised a similar query earlier but did not find a solution. The link to the previous query is : Data annotation : Error in merge datasets

magdaaniol · January 10, 2025, 1:05pm

Hi @Manoj ,

What exactly do you mean by:

the model does not capture the course data from the dataset.

Do you mean that the scores for this category in on the evaluation dataset are lower than expected? Or have you tried your model on the train dataset and it did not recognize any COURSE entites?
Could you share your evaluation results?

Some common issues to look out for:

Imbalanced entity distribution (if you have significantly fewer COURSE entities than PERSON or ORG or perhaps your eval dataset does not contain any or very few)
Inconsistent annotation patterns (e.g., "Introduction to Python" vs just "Python" for courses)
Token boundary mismatches, especially with special characters or whitespace - if spans and tokens are misaligned spaCy will discard such spans as examples

I recommend you export your data to spaCy using data-to-spacy and then run spaCy data debugging tools such as spacy debug data to see if there are any structural issues in the dataset.

Topic		Replies	Views
Data annotation : Error in merge datasets ner	5	21	January 10, 2025
combining multiple models and exporting training data to spacy ner , spacy	3	2881	November 13, 2018
Merging annotations from different datasets usage , ner , database , solved	12	5876	May 28, 2019
Merging annotation models? usage , ner , solved	2	739	August 4, 2019
Combining two separate datasets into a single trained model ner	2	254	December 6, 2023

Data annotation : Query Regarding Data Annotation and Merging in Prodigy

Related topics