Combining and validating spaCy labels and in-house NER output

cheyanneb · July 18, 2023, 9:21pm

I am comparing in-house NER model output with spaCy NER output. Is there a way to combine the output from the two models into spans (since some might overlap) where the annotator just sees one labeled utterance using ner.correct?

Comparison example using review where I manually corrected the utterance:

koaning · July 19, 2023, 8:28am

I'm curious to understand the use-case for merging. I typically find it more useful to see the models disagree, but would I be correct in assuming that you use-case involves two pipelines that have different labels?

It's certainly possible, but it would involve writing some custom code. In theory, assuming both models use the same tokeniser, you could just have both models make a prediction and just merge the spans key. You could use the combine_models function for this. Would that work?

I can help write the code for this, but I just want to make sure that I understand the problem first.

cheyanneb · July 19, 2023, 12:55pm

Yes, I agree -- I would prefer to see the output from both models, but a colleague is asking for a merge so I'm trying to figure out what might work. I actually tested a spans solution last night since the labels from each model could also overlap/have different label names, but need a way to combine the datasets (I just created a small test set), so I'll try this combine models function.

koaning · July 20, 2023, 10:13am

I just realized that I gave some bad advice.

If you have two spaCy models, it's probably much easier to use the filter_spans utility instead.

The combine_models approach will interleave the predictions, which isn't what you want.

Topic		Replies	Views
combining multiple models and exporting training data to spacy ner , spacy	3	2886	November 13, 2018
Error assigning label ID when combining custom NER model from Prodigy with Spacy dependency parsing model usage , ner , spacy	5	1719	July 15, 2020
Is it possible for the entities tagged and merged in one document to be respected when passed to another spacy.load() model? usage , ner , spacy	3	513	December 3, 2020
Merging single label-based models into one multiple label-model usage , ner , solved	3	1078	June 10, 2020
Combining the results from two different Models	1	157	December 14, 2023

Combining and validating spaCy labels and in-house NER output

Related topics