Model vs Dataset Metric Weights

I recently reinstalled windows 11 for the 50th time and forgot to backup the .prodigy folder. I was able to recover my venv and copied the model-last and best-model folders to my new venv.

The dataset (~2k annotations) appears to be lost with the .db file, so I started a new dataset with spans.correct using the model-last path as the pipeline. The pipeline performs virtually the same when annotating but when I train it the f-score is back in the 50s like im starting over.

What is salvagable from the model-last and model-best? Do I need to do all the annotations over again? Thanks.

I really like to use Jupyter to prepare the data sets that I send to Prodigy with a pre-trained model. For example, you could use the model to fetch the examples with a span prediction of an underrepresented class that you're interested in, but this is a manual process of creating subsets to sent to Prodigy for annotation.

While this might help a bit in recreating your original dataset, I fear the actual data itself is indeed lost. :slightly_frowning_face:

hehe yes it was a good learning experience and now i am the annotator master :slight_smile:

1 Like