It looks to me like you just need to get your labels into the string store. I think where things are going wrong is, if you load two models nlp1
and nlp2
, the pipeline components in the two models will have different StringStore
and Vocab
instances. I think there’s somewhere in spaCy that’s assuming that the component’s string store is the same as the Doc
object’s, as this is normally the case.
You could do something like:
for label in labels:
nlp.vocab.strings.add(label)
But actually I think you might find the easiest solution is to just merge the directories. You can copy the model files for the pipeline components into one model directory, and then just edit the meta.json
. This should give you an easy and reliable way to combine your components.