I am currently using prodigy to create a ner "benchmark" dataset in order to compare the performance of several models trained on synthetic data.
So far everything works great, I am using
ner.correct with the
--update flag to have a pre-trained model in the loop which "helps" me doing the labeling.
But after that, during the benchmarking I checked the labels which the models didn't predict (false negatives) and realized that I made a few mistakes regarding the annotations (false span boundaries).
So I would like to re-label or re-annotate all the examples for which the model didn't predict the correct ner-spans (false negatives).
Question: what would be the best way to do this?
I thought about
- exporting the dataset (as *.jsonl)
- split the data into two categories: correct ones and the ones to re-annotate
- import both into prodigy again
ner.correcton the "re-annotate" dataset
data-to-spacyto combine both datasets and export them as one
But maybe there is a better way? I feel that maybe a custom recipe would be a good Idea as we just startet working with prodigy and I think we will need this "conditioned re-labeling" more than once.
It would be great if you would have a suggestion on how implement this as a custom recipe