Best way to re-label / re-annotate existing data based on condition

_tom · September 8, 2022, 12:55pm

Dear prodigy-Team,

I am currently using prodigy to create a ner "benchmark" dataset in order to compare the performance of several models trained on synthetic data.
So far everything works great, I am using ner.correct with the --update flag to have a pre-trained model in the loop which "helps" me doing the labeling.
But after that, during the benchmarking I checked the labels which the models didn't predict (false negatives) and realized that I made a few mistakes regarding the annotations (false span boundaries).

So I would like to re-label or re-annotate all the examples for which the model didn't predict the correct ner-spans (false negatives).

Question: what would be the best way to do this?

I thought about

exporting the dataset (as *.jsonl)
split the data into two categories: correct ones and the ones to re-annotate
import both into prodigy again
run ner.correct on the "re-annotate" dataset
use data-to-spacy to combine both datasets and export them as one

But maybe there is a better way? I feel that maybe a custom recipe would be a good Idea as we just startet working with prodigy and I think we will need this "conditioned re-labeling" more than once.
It would be great if you would have a suggestion on how implement this as a custom recipe

Thanks!

koaning · September 19, 2022, 9:01am

Your method sounds reasonable to me, although you might also just use a ner.manual with prelabelled data as well if you find that more intuitive. I usually have a notebook with a script that can pull data from Prodigy and can make a candidate list of items to double-check. And in general, I recommend looking at examples where the annotation and the model disagree on a label. Usually, there's something insightful when that happens.

You might appreciate this Prodigy video I made a while ago on the topic of "finding bad labels". I'm showcasing a few general techniques for text classification that you might find inspiring.

If you're annotating with a team, I might try to formalize this a bit further, mainly because you want to document and prevent annotation mistakes from happening in the future.

Topic		Replies	Views
Re-annotating records usage , database , streams	4	565	May 5, 2020
how to use ner.correct --update usage , ner , solved	4	684	October 21, 2021
How to overwrite/correct annotations? ner , solved	7	2064	September 7, 2021
Does data need to be reannotated to use train recipe for predicting span labels after rel.manual recipe was used? usage , ner , spancat	1	386	October 15, 2021
Corrections on an already annotated NER dataset usage , ner	3	521	December 21, 2022

Best way to re-label / re-annotate existing data based on condition

Related topics