Hi all - I am very new to spaCy and Prodigy. A former colleague of mine set up a model using Prodigy that apparently uses spaCy v2. I would like to further train the model, but need to update to spaCy v3.
I've read the migration instructions here: https://spacy.io/usage/v3/. I don't think any code changes need to be made.
My issue: I'm not sure how to run prodigy train to update the model to spaCy v3. I see the train docs here: https://prodi.gy/docs/recipes#train
I'm not sure what dataset I'm supposed to use. Here is the file structure for my model:
Hi! If you still have the original data, this should hopefully be very straightforward, especially if the original annotations were created with Prodigy as well In that case, you can import the existing annotations into Prodigy using db-in, and then run prodigy train to train an updated model: https://prodi.gy/docs/recipes#train
As you collect more annotations, you can then keep running prodigy train with all of the data (your previous dataset and your new dataset) to train better and better versions.
Aw, damn, if the engineer lost the original training data, that's definitely very unfortunate A model without the data will be immediately stale, and it'll make it difficult to improve it further, try out new ideas etc. Even when you improve a model with more data, you typically want to retrain on all the annotations for the best results.
If the data was annotated with Prodigy, it might still be in the database? Maybe ask the engineer to run prodigy stats -l to view all datasets and see if the data is in there. They can then export it using prodigy db-out.
If you still have the original raw data, one way to recover at least some of the annotations could be to run your model over the raw data and save its predictions. This may not be as accurate as the original training data, but you can always correct it later usind Prodigy and it'll be faster than doing everything from scratch.