relation between Tagger, Parser, and NER pipeline in spacy

Hi! In spaCy v2.x, the pipeline components are separate, so your observation is correct. So the entity recognizer doesn't use any features like part-of-speech tags or anything else set by the other components. They only share the embeddings and word vectors, if available. So if you train your model with one set of word vectors and then remove or replace them, the model will likely perform really badly.

(Btw, if you're interested in more details on the feature and haven't seen it yet, you might find this video on the NER model helpful. )

No, there's no direct interaction and an update to one component would never update the weights of another.

However, it is possible that updates to one component can change the output of another. For example, by default, the parser will assign the sentence boundaries. The named entity recognizer is constrained by the sentence boundaries, so it'll never predict entities that cross a sentence boundary (which makes sense). So if you update the parser and it ends up predicting different sentence boundaries, you could theoretically end up with different entity predictions. But that's only because you redefined the constraints for the predictions – the NER weights themselves didn't actually change.

To make it run fast and keep the model size small, spaCy's implementation (LMAO) uses a CNN to predict the vector of each word given its context. So we're not predicting the actual word, just its rough meaning, which is easier, and lets us leverage existing pre-trained word embeddings. For a quick overview and some results and examples, see my slides here:

In spaCy 2.1+, you can use the pretrain command to create a tok2vec (token to vector) artifact, that you can initialise a model with. Those pre-trained representations will then be shared by all components in the pipeline. The next update of Prodigy will introduce support for spaCy 2.1 and also for training with tok2vec artifacts. (See this thread for details and progress on the update. Since it's a breaking change that'll require all Prodigy users to retrain their models, we want to make sure to fix some outstanding spaCy bugs and test everything before we publish the update.)

1 Like