combining multiple models and exporting training data to spacy

I have successfully trained a couple of NER models for new classes using prodigy on top of the default spacy2 models. Each of these needed around 2-5 thousand annotations to achieve a good accuracy (above 90%). Now I want to combine these mutually exclusive classes into a single model.

The naive approach was to merge all datasets used for training each individual model and do a batch train for all labels. But even after experimenting around with hyperparameters and adding thousands of additional annotations to the dataset, the combined model still does not perform anywhere close to the individual models.

Are there are some common pitfalls when combining multiple labels that I might now have considered?

One possible explanation for this would be that the data is somehow being merged in a wrong way and now I want to try merging all spans myself and then use spacy directly instead of using the prodigy commands.

While investigating how the spans are merged from prodigy datasets, I wanted to convert the annotations from Prodigy to Spacy’s format. I found this post (Mixing in gold data to avoid catastrophic forgetting) in the support forums.

From the code in this post it looks like only “accept” answers are being used and all other answers were dropped? This seems very weird to me since I had the impression that “reject” annotations were helpful while training the individual NER models.

How does Prodigy handle “reject” annotations internally and how can I transfer reject annotations to a format that can be used by spacy?

Hi Marc,

Is it possible to email the combined dataset to me? I’d like to take a closer look at this, to make sure there’s not some problem.

If you can’t send the data, could you look at the combined dataset with the ner.print-dataset recipe, to see if you can spot any problems yourself?

Some additional questions:

  • Did you annotate the data with ner.manual, or with ner.teach?
  • Did you use the --no-missing flag during training?
  • Did you annotate the same text with the different labels, or different text?

I’ll have to double check but I can probably send you an email with the separate datasets as well as the combined one.

Did you annotate the data with ner.manual, or with ner.teach?

ner.teach and ner.make-gold

Did you use the --no-missing flag during training?


Did you annotate the same text with the different labels or different text?

I have a lot of documents for this task. There is probably some overlapping in datasets but not necessarily

I think I understand what the problem might be. Unfortunately this is one of the workflows we’re least satisfied with.

Let’s say you annotate all PERSON entities in document 1, and all ORG entities in document 2. Then you make a new dataset, with both documents.

The problem is, there’s no way to tell the model to expect that each document is only annotated for one entity type. This means the model has to assume that any text where no entities are marked might actually contain missing entities. If you only had one entity type, you’d be able to use the --no-missing flag.

The best solution is probably to get the same text annotated with all of your entities. You can probably train with your current annotations, and use them to bootstrap. You’ll want to get predictions for each of your models on the texts, and then merge those predictions into one dataset. Finally, you can then run ner.manual on the result, to clean up any errors.

You’ll probably find it useful to do the prediction and merging in a separate script, as it’s pretty much a once-off task that doesn’t need Prodigy.

1 Like