sense2vec ner

My apologies if I missed this in the documentation. I am working on an ner project that is modeled after your YouTube video, training the ingredient ner. In the video, you used the following script run the train recipe:

prodigy train ner food_data en_vectors_web_lg --init-tok2veccd8_model289.bin --output ./tmp_model --eval-split 0.2

My understanding is that --init is now replaced by config. I am unsure how to specify the .bin file and whether I should be adapting the config file since that is specified using word2vec.

Or, if this doesn't make sense, I just want to replicate the video and am not sure how to specify the ner train script.

Thanks in advance.


Hi! Your analysis is correct, yes – in Prodigy v1.11 and spaCy v3, we were able to standardise a lot of this stuff because it can now be specified explicitly in the config, so we don't need all of these Prodigy-specific arguments on the train command anymore.

Also, just to be clear, the pretrained tok2vec weights used to initialize the model are generated using spaCy's pretrain command, so they're not just word vectors. You can specify the path to it as the init_tok2vec setting in the [initialize] block: