Prodigy train fr vectors for Token2Vect

Jacques · September 17, 2021, 1:30pm

Hi,
I am replaying the Training a NAMED ENTITY RECOGNITION MODEL with Prodigy and Transfer Learning.
I am working on french language. I did the ner.manual on my own labels without any issue.
Right now I want to train the token2vect for ner with my dataset and using the French tok2vec pretain token.
I am missing information about how to pretrain this token2vec .
Ines did this pretraining, which took ~8 hours on GPU, and it can be found here: tok2vec_cd8_model289.bin
I would like to learn how to do the same thing for French language.
Thanks in advance

ines · September 20, 2021, 6:27am

Hi! The pretrained tok2vec weights were created using the spacy pretrain command with a lot of raw text. You can find the details and documentation here:

spaCy v2: https://v2.spacy.io/api/cli/#pretrain
spaCy v3: https://spacy.io/api/cli/#pretrain

The pretraining uses a language modelling objective, similar to how embeddings like BERT are trained. If you have a lot of raw text, this can be a good way to boost your accuracy.

If you're using spaCy v3, you can also initialize your model with existing transformer embeddings, which will have a similar effect. You can use the quickstart widget to generate a transformer-based config for French here: https://spacy.io/usage/training#quickstart To export your annotations for use with spaCy, you can use the data-to-spacy command. If you're using transformers, you should have a GPU available for training.

Jacques · September 21, 2021, 6:59am

Hi Ines,
Thanks for your replies, I am using Spacy V3 so I will investigate the quickstart you've pointed to me.
Have a good day and keep going to provide us such great library and useful tools !

Topic		Replies	Views
good configs for spacy pretraining usage , spacy	11	2608	November 22, 2022
Train only the Tok2Vec Layer from within Prodigy and use it for further models	3	196	January 4, 2024
how to use --init-token2vec in prodigy train usage , ner , spacy , training	1	322	October 6, 2021
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1544	February 13, 2022
pretrained tok2vec weights - prodigy v 1.11 bug , ner , spacy	5	737	October 21, 2021

Prodigy train fr vectors for Token2Vect

Related topics