Model vs Dataset Metric Weights

kylebigelow · March 29, 2022, 1:52am

I recently reinstalled windows 11 for the 50th time and forgot to backup the .prodigy folder. I was able to recover my venv and copied the model-last and best-model folders to my new venv.

The dataset (~2k annotations) appears to be lost with the .db file, so I started a new dataset with spans.correct using the model-last path as the pipeline. The pipeline performs virtually the same when annotating but when I train it the f-score is back in the 50s like im starting over.

What is salvagable from the model-last and model-best? Do I need to do all the annotations over again? Thanks.

koaning · April 11, 2022, 10:29am

I really like to use Jupyter to prepare the data sets that I send to Prodigy with a pre-trained model. For example, you could use the model to fetch the examples with a span prediction of an underrepresented class that you're interested in, but this is a manual process of creating subsets to sent to Prodigy for annotation.

While this might help a bit in recreating your original dataset, I fear the actual data itself is indeed lost.

kylebigelow · April 13, 2022, 2:16pm

hehe yes it was a good learning experience and now i am the annotator master

Topic		Replies	Views
Annotated dataset lost, what now? usage , database , solved	4	408	May 25, 2021
Resuming annotation with a model in the loop usage , solved	2	1310	March 6, 2018
Help updating spaCy v2 model usage , spacy	5	381	December 15, 2021
Annotation score drops	21	475	May 18, 2023
Restore lost annotated dataset from training.jsonl and evalution.jsonl found in a trained model usage , database , solved	4	495	January 21, 2020

Model vs Dataset Metric Weights

Related topics