Prodigy model details

Hi,

I saw the reply in this thread The model details behind Prodigy and just want to verify that the latest prodigy 1.10.8 which I think uses spacy 2.3.7 uses the spacy default model for train and that this is the cnn model.

Additionally this is the same architecture we get in spacy 3 if we use the Quickstart config and choose english,ner,efficiency,cpu right?

Hi!

Prodigy 1.10 is indeed compatible with spaCy v2. If you run prodigy train without base_model, you'll get the component architecture that is equivalent to the following spaCy v2 code:

component = nlp.create_pipe(name)
nlp.add_pipe(component)

with name one out of ["ner", "textcat", "tagger", "parser"]. You can find some more details on these default spaCy v2 architectures here: https://v2.spacy.io/models#architecture

And yes, these architectures are largely the same as those you get when using spaCy v3 with settings for the CPU. Some of the hyperparameters might be slightly different though. Additionally, spaCy v2 always used "inline" tok2vec layers for each component, while v3 allows to share the same embedding layer across components. This may also result in slightly different performances if you're training several pipeline components with the same embedding layer. More information is here: https://spacy.io/usage/embeddings-transformers#embedding-layers

FYI - If you do provide a base_model for prodigy train, the architecture of the component depends on that model, and the component will be further tuned instead of trained from scratch.

Hope that helps!

1 Like