Add custom NER model from prodigy to spacy pipeline

I created a custom NER model using Prodi.gy. I saved the model to disk, once I performed all of the processing and validations. I can instantiate the model from disk using spacy.load and it seems to work well. My question now is how do I add that custom NER model to a spacy pipeline? I want to make sure I have the tagger, parser, etc. in the pipeline plus my custom NER model.

It seems like I should initialize a base nlp from one of the existing models (en_core_web_sm), remove the existing NER, and replace it with my custom NER. This is no doubt user error, I just can't seem to figure out from the documentation and trial/error what I am doing wrong (or need to do).

Maybe my operations are wrong? Maybe I should try to add the tagger and parser to my custom model instantiation?

I was able to get it to work by adding the "tagged" and "parser" from one of the en models and then modifying the meta.json file. That doesn't seem like the right approach. 

I tried this obviously not right:

nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)

nlp_entity = spacy.load("custom_ner_model")

nlp.add_pipe(nlp_entity)
print("Pipeline", nlp.pipe_names)

Pipeline ['tagger', 'parser']
Pipeline ['tagger', 'parser', 'English']

I then tried this to build the NER from the custom model and add it and also not right:

nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)

nlp_entity = spacy.load("custom_ner_model")
ner = nlp_entity.create_pipe("ner")

nlp.add_pipe(ner,last=True)
print("Pipeline", nlp.pipe_names)

Error if I try to run with ner in pipeline:

text = "This is a test"
doc = nlp(text)
displacy.render(doc, style="ent")

ValueError: [E109] Model for component 'ner' not initialized. Did you forget to load a model, or forget to call begin_training()?

Also got this error, which is what drove me to try adding tagger/parser from the base en models

ValueError: [E155] The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA. Try using nlp() instead of nlp.make_doc() or list(nlp.pipe()) instead of list(nlp.tokenizer.pipe()).

Hi! Your instinct wasn't actually wrong, but the problem in your code is that in the first example, you tried to add the nlp object to the pipeline, and in your second example, you created a blank uninitialized entity recognizer with no weights, instead of using the one you previously trained.

The cleanest and most straightforward solution would be to save out a version of the en_core_web_sm model with a tagger and parser, but no entity recognizer, and then use that as the base model when you train your entity recognizer with Prodigy:

nlp = spacy.load("en_core_web_sm")
nlp.remove_pipe("ner")
print(nlp.pipe_names)  # ['tagger', 'parser']
nlp.to_disk("./en_tagger_parser_sm")  # use that path for training

Alternatively, if you already have your entity recognizer trained, you can also just take the ner pipe and add it to your base model:

nlp = spacy.load("en_core_web_sm")
nlp.remove_pipe("ner")
print(nlp.pipe_names)  # ['tagger', 'parser']

nlp_entity = spacy.load("custom_ner_model")
# Get the ner pipe from this model and add it to base model
ner = nlp_entity.get_pipe("ner")
nlp.add_pipe(ner)
print(nlp.pipe_names)  # ['tagger', 'parser', 'ner']

nlp.to_disk("./custom_model")
1 Like

Thank you, Ines! In another example, I was removing the NER from the orig/base model, but I wasn't properly adding the NER pipe from my custom model. This is great, thank you! I figured it was user error.

1 Like

Hi @ines , i posted a related question here, caused by the change from spaCy V2 to V3. Looking forward to your reply.