Re-labeling

hannahlindsley · May 17, 2018, 4:57pm

For our use case, we want to start from scratch with streaming a raw dataset from a kafka queue. We’re set up to label custom entities in text without a model in the loop.
We have a single, common postgres for the db.

We’d like to be able to go back into a dataset and add labels on it; for example, one labeler might be annotating phone numbers, and another labeler might come along 6 months later and want to add an address label to the same data.

Is there currently support for updating an existing dataset that’s built off a streaming queue?

ines · May 17, 2018, 5:42pm

Prodigy’s stream and JSON output formats are pretty much identical, so you can always go back and load in an existing dataset. For example:

prodigy db-out some_dataset > some_dataset.jsonl
prodigy ner.manual other_dataset en_core_web_sm some_dataset.jsonl --label SOME_LABEL

The ner.manual recipe respects pre-defined entities, so the annotator will see everything that was labelled before, can correct the spans and also add new entities.

While you can technically add the new annotations to the old dataset, I’d still recommend creating a new one when you re-annotate an existing set later. This gives you a cleaner separation and if something goes wrong, you’ll always have a record of the previous set.

Hope this helps!

Topic		Replies	Views
Annotate same text with different label usage , ner , solved , streams	1	405	March 6, 2022
Corrections on an already annotated NER dataset usage , ner	3	522	December 21, 2022
Modify/reannotate existing documents usage , solved , streams	2	703	January 13, 2021
Correction of manually labeled relation usage , ner , database , solved , relations	3	350	October 25, 2021
"evolving" an annotation dataset by adding labels? solved	2	224	October 30, 2023

Re-labeling

Related topics