SpaCy NER models Architecture details

I'm working on a NER custom project and I managed to train a blank spacy model and CamemBERT transformer and compare between them, now I have to write some documentation about both of those models, I did a research and I found out that the blank model is based on CNN and LSTM but there are no details about the layers and the parameters that were used in that architecture. and the same for the transformer.
So can anyone help me with some resources?

If you're looking for details on the spaCy v2 NER model, this video by @honnibal explains how it works in detail:

For spaCy v3, the built-in architectures are all documented in the API reference. See here for details:

Not sure how relevant this is for your project, but training a blank spaCy model without initialising it with any pretrained embeddings and then comparing that to a model with pretrained embeddings doesn't sound that useful – unless this is explicitly what your comparison is about? Because the only real takeaway you'll get from this is "initialising with embeddings is usually better than initialising without embeddings", which was kind of obvious before. So if you're comparing different architectures (e.g. spaCy's transition-based approach vs. something else), you probably want to train a model using the same embeddings, e.g. CammemBERT: Embeddings, Transformers and Transfer Learning · spaCy Usage Documentation

1 Like

Thank you @ines for your response

yeah, it's just a comparison to show that using a pre-trained model gives better results than the non-pretained one. but it's not the goal of the project it's just an observation, the goal is to create a NER model with good accuracy so I will be able to use it in an application.

Ah okay, in that case, you definitely want to be using spaCy v3 because it'll let you train two models using the same architecture and settings but one with pretrained embeddings (e.g. CammemBERT) and one without, and maybe another one with just word vectors as features. This way, you can have a meaningful comparison, because the only variance between the experiments is the pretrained embeddings that are used.

1 Like