Do the outputted models using textcat.batch-train make use of word vectors?

Hi guys

Quick question on the models outputted by prodigy - do they use word vectors?
The model I trained and outputted seems like it doesn’t have any knowledge of word vectors and the model is quite small in size ~10mb

If not, how can I use the word-vectors I have for pretraining?


If word vectors are present in the model you're updating then yes, spaCy will use those representations during training. This can sometimes give you a nice boost in accuracy.

If your model is only 10mb, it's likely that you started off with an sm model that doesn't have word vectors. So instead, try using a model like en_core_web_md or en_core_web_lg.

Actual pre-training is different – that's something we just introduced in spaCy v2.1. Here, you're pre-training weights using lots of raw unlabelled text and word vectors. Also see this blog post for examples. Once you have that artifact, you can pass it in when you train your model. In the next version of Prodigy, which will introduce support for spaCy v2.1, you'll also be able to pass in those pre-trained weights files in the textcat.teach and ner.teach recipes.

ah got it! makes sense

Thanks ines