Prodigy Support for Spacy_DBpedia_Spotlight Pipeline

Hi Prodigy Team,

I am using Spacy's Dbpedia Spotlight pipeline for NER and would like to implement a Prodigy ner.correct session based on that model. I know you can specify the component within the ner.correct recipe, but since the DBpedia Spotlight pipeline isn't included by default, that isn't recognized as an option even when downloaded. Is there a simpler way to call this pipeline within the standard ner.correct recipe or is it necessary to write a custom recipe for this support?

Thanks for your help!

Hi @danalynn , welcome to Prodigy!

It should be possible by loading the model, saving it to disk, and pointing the ner.correct recipe to that path. Something like this:

import spacy

nlp = spacy.load("en_core_web_lg")
print(nlp.pipe_names) # ['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'dbpedia_spotlight']

nlp.to_disk("pipe_with_dbpedia")  # c.f.

Then afterwards you can pass them to the spacy_model positional argument of ner.correct. Something like this:

prodigy ner.correct my-dataset pipe_with_dbpedia ...

Assuming that the spacy-dbpedia-spotlight component sets doc.ents, then it should work out of the box.