Hi,
I've been developing a document parser using Prodigy, and the tool has made it very easy to fly through documents. It seems to have almost made it too easy. While training I take periodic breaks to retrain and check the scores of the model through the train recipe. Through almost all of the labeling process the training scores have only increased. However, now they are decreasing. This could be for a lot of reasons, accidental mislabels, underfit, overfit, etc. but whatever the reason I'm trying to find a way to loop through training sessions using different amounts of annotated data. I know how I can easily loop through the training sessions, but i cant seem to find any way to store the best score from each session. The only way i can see as of now is to manually record each session. Is there a way i can store the training scores of each model for review later?
I'd like to clarify further what your workflow looks like. Are you training your model iteratively using the train command or are you using the active learning approach? You've mentioned the train recipe, so I'll assume it's the former.
If it's the former, then you can probably make use of spacy evaluate. You can pass a path to a JSON file so that it saves the output immediately.
To add to Lj's comment, you could also look into Weights & Biases if you're not using it already It easily integrates with spaCy and will store all your training results, as well as the whole config for reproducible experiments. You can then track and visualise your results over time, and see how configuration changes impact the scores:
There's also a Prodigy integration that lets you upload, version and track your datasets. So every time you make changes, you can upload and track it: