changing annotations in DB via the interface

Hi there,

thanks for your great work so far!
I'm new to the whole data science/AI thing, so forgive me if my question has already been answered.

I'm using prodigy for tagging named entities and when we started out, we didn't set very clear cut rules of what exactly to tag since we realized that it would have to depend what was in our texts. Now we want to remove the tags for a large number of entities as well as add tags to entities that were not tagged in the beginning but have been tagged recently. For a few tagging mistakes (i.e. accidentally tagging line breaks) so far we simply looking them all up in the DB and removed them each manually, but this would be too much of a hastle for the quantity of annotations we have to change.

From what I can see there is no way to go back to the annotations via the prodigy web interface. Can you recommend a work around?

Also, is this a functionality you have planned for the upcoming prodigy teams? Or the prodigy annotation manager?

Thanks in advance for your help,

Best, AWI

Hi and thanks!

Datasets in Prodigy are append-only by design: you typically don't want to overwrite existing records, because that means you'd lose a datapoint you've collected. And it'd also make it too easy to erase work. Instead, you can re-annotate and correct the data, and save the results to a new dataset. If you make a mistake, you still have the previous data and can start again.

Prodigy's input and output formats are the same – so you can always export a dataset and load the data back in. For example, if you load a manually-annotated NER dataset back into ner.manual, the entities will be pre-highlighted and you can correct them.

If it's possible to automate some of the changes, that's great, too – for instance, if you removed label X from your label scheme, you can iterate over the "spans" and remove all entries that contain "label": "X" before you send them out for correction again.

If you have conflicting annotations that you want to resolve to one final "master corpus", you can also use the review recipe. It takes one or more datasets with one or more sessions and will group annotations on the same input together. So if annotator A has labelled a span and annotator B hasn't, you can see both and decide what the correct answer is (or even label something entirely different by hand).

Btw, I was just working on the new features for v1.9 and the upcoming version will make line breaks unselectable by default to prevent this :raised_hands:

Prodigy Teams will have a visual mode for viewing annotations and "editing" them by clicking on them – that's all much more easily possible in an annotation management web app. However, "editing" here also means that you create a new record and mark the old one as outdated. This way, original answer still exists and is connected to the annotator who created it – it's just not used and replaced by your answer.

1 Like

Thank you for the patient and thorough explanation, I will try what you have suggested.
And great to hear about the new feature for the linebreaks :+1: