Help updating spaCy v2 model

Hi all - I am very new to spaCy and Prodigy. A former colleague of mine set up a model using Prodigy that apparently uses spaCy v2. I would like to further train the model, but need to update to spaCy v3.

I've read the migration instructions here: I don't think any code changes need to be made.

My issue: I'm not sure how to run prodigy train to update the model to spaCy v3. I see the train docs here:

I'm not sure what dataset I'm supposed to use. Here is the file structure for my model:

Any help would be so, so appreciated!!!

Hi! If you still have the original data, this should hopefully be very straightforward, especially if the original annotations were created with Prodigy as well :blush: In that case, you can import the existing annotations into Prodigy using db-in, and then run prodigy train to train an updated model:

As you collect more annotations, you can then keep running prodigy train with all of the data (your previous dataset and your new dataset) to train better and better versions.

I'm not sure if I still have the original data. How would it be stored? What am I looking for?

These are the files I have:

Hi @raquelmsmith ,

It shouldn't be in the food_model folder because it only contains the model itself. Perhaps it should be a JSONL, CSV, or even .txt file?

The engineer who originally made the model said he doesn't have the original annotations :weary: Am I out of luck?

Aw, damn, if the engineer lost the original training data, that's definitely very unfortunate :disappointed: A model without the data will be immediately stale, and it'll make it difficult to improve it further, try out new ideas etc. Even when you improve a model with more data, you typically want to retrain on all the annotations for the best results.

If the data was annotated with Prodigy, it might still be in the database? Maybe ask the engineer to run prodigy stats -l to view all datasets and see if the data is in there. They can then export it using prodigy db-out.

If you still have the original raw data, one way to recover at least some of the annotations could be to run your model over the raw data and save its predictions. This may not be as accurate as the original training data, but you can always correct it later usind Prodigy and it'll be faster than doing everything from scratch.