Help updating spaCy v2 model

raquelmsmith · December 13, 2021, 10:33pm

Hi all - I am very new to spaCy and Prodigy. A former colleague of mine set up a model using Prodigy that apparently uses spaCy v2. I would like to further train the model, but need to update to spaCy v3.

I've read the migration instructions here: https://spacy.io/usage/v3/. I don't think any code changes need to be made.

My issue: I'm not sure how to run prodigy train to update the model to spaCy v3. I see the train docs here: https://prodi.gy/docs/recipes#train

I'm not sure what dataset I'm supposed to use. Here is the file structure for my model:

Any help would be so, so appreciated!!!

ines · December 14, 2021, 9:49am

Hi! If you still have the original data, this should hopefully be very straightforward, especially if the original annotations were created with Prodigy as well In that case, you can import the existing annotations into Prodigy using db-in, and then run prodigy train to train an updated model: https://prodi.gy/docs/recipes#train

As you collect more annotations, you can then keep running prodigy train with all of the data (your previous dataset and your new dataset) to train better and better versions.

raquelmsmith · December 14, 2021, 9:06pm

I'm not sure if I still have the original data. How would it be stored? What am I looking for?

These are the files I have:

ljvmiranda921 · December 15, 2021, 12:31am

Hi @raquelmsmith ,

It shouldn't be in the food_model folder because it only contains the model itself. Perhaps it should be a JSONL, CSV, or even .txt file?

raquelmsmith · December 15, 2021, 3:16am

The engineer who originally made the model said he doesn't have the original annotations Am I out of luck?

ines · December 15, 2021, 12:06pm

Aw, damn, if the engineer lost the original training data, that's definitely very unfortunate A model without the data will be immediately stale, and it'll make it difficult to improve it further, try out new ideas etc. Even when you improve a model with more data, you typically want to retrain on all the annotations for the best results.

If the data was annotated with Prodigy, it might still be in the database? Maybe ask the engineer to run prodigy stats -l to view all datasets and see if the data is in there. They can then export it using prodigy db-out.

If you still have the original raw data, one way to recover at least some of the annotations could be to run your model over the raw data and save its predictions. This may not be as accurate as the original training data, but you can always correct it later usind Prodigy and it'll be faster than doing everything from scratch.

Topic		Replies	Views
Feeding prodigy annotated data to spacy in python usage , spacy , training	4	651	October 8, 2021
update spacy model ner , spacy , solved , training	6	1135	October 8, 2021
spaCy Training Model vs Prodigy usage , spacy , solved	1	370	March 21, 2022
Saved model doesn't work after update usage , spacy	2	518	October 24, 2017
Tune existing Spacy NER model usage , ner	5	309	April 16, 2022

Help updating spaCy v2 model

Related topics