I created a custom NER model using Prodi.gy. I saved the model to disk, once I performed all of the processing and validations. I can instantiate the model from disk using spacy.load and it seems to work well. My question now is how do I add that custom NER model to a spacy pipeline? I want to make sure I have the tagger, parser, etc. in the pipeline plus my custom NER model.
It seems like I should initialize a base nlp from one of the existing models (en_core_web_sm), remove the existing NER, and replace it with my custom NER. This is no doubt user error, I just can't seem to figure out from the documentation and trial/error what I am doing wrong (or need to do).
Maybe my operations are wrong? Maybe I should try to add the tagger and parser to my custom model instantiation?
I was able to get it to work by adding the "tagged" and "parser" from one of the en models and then modifying the meta.json file. That doesn't seem like the right approach.
I tried this obviously not right:
nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)
nlp_entity = spacy.load("custom_ner_model")
nlp.add_pipe(nlp_entity)
print("Pipeline", nlp.pipe_names)
Pipeline ['tagger', 'parser']
Pipeline ['tagger', 'parser', 'English']
I then tried this to build the NER from the custom model and add it and also not right:
nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)
nlp_entity = spacy.load("custom_ner_model")
ner = nlp_entity.create_pipe("ner")
nlp.add_pipe(ner,last=True)
print("Pipeline", nlp.pipe_names)
Error if I try to run with ner in pipeline:
text = "This is a test"
doc = nlp(text)
displacy.render(doc, style="ent")
ValueError: [E109] Model for component 'ner' not initialized. Did you forget to load a model, or forget to call begin_training()?
Also got this error, which is what drove me to try adding tagger/parser from the base en models
ValueError: [E155] The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA. Try using nlp() instead of nlp.make_doc() or list(nlp.pipe()) instead of list(nlp.tokenizer.pipe()).