I have a well-performing NER model (self-trained) as well as a well-performing spancat model. However, they do not share the same Tok2Vec / Vocab. Therefore, just adding one to the other with nlp.add_pipe does not bring the same results.
Can I apply them to the same sentence idenpendently and that just add the span information from the spancat doc to the ner doc? So that I have one final doc that includes both entities and spans.
hi @DerDiego13,
Thanks for your question.
If your vectors are different, then you can try this:
import spacy
from spacy.tokens import Doc
nlp1 = spacy.load("ner_model")
nlp2 = spacy.load("spancat_model")
doc1 = nlp1(text)
doc2 = Doc(nlp2.vocab).from_bytes(doc1.to_bytes())
doc2 = nlp2(doc2)
Just for completeness, if you're assuming the same vectors it would be:
nlp_ner = spacy.load("ner_model")
nlp_spancat = spacy.load("spancat_model", vocab=nlp_ner.vocab)
doc = nlp_ner(text)
doc = nlp_spancat(doc)
Or use nlp.add_pipe
but replace the listeners first:
nlp_spancat.replace_listeners("tok2vec", "spancat", ["model.tok2vec"]) nlp_ner.add_pipe("spancat", source=nlp_spancat)
FYI, for questions like this that are spaCy-specific, make sure to check out the spaCy GitHub discussions first and consider posting there. This forum is for Prodigy-specific questions and while there can sometimes be overlap, you'll likely get a faster response by posting spaCy questions there.
Hope this helps!
1 Like