Correction of manually labeled relation


I have the following problem:
I started to manually label a dataset w.r.t named entities and relations. My dataset consists of ~5000 sentences, of which I have labeled 350 up to this point.

Unfortunately, I did a mistake during the labeling process:
In some document, I labeled a relation between two words where I forgot to label one word as an entity. That is a problem for later downstream tasks. Is it possible to correct this issue? I loaded the already labeled data in python, so I know among others the _task_hash of this document/sentence. I am looking for a recipe which allows me to correct the specific document labels given the _task_hash.

Thank's a lot for your help!

I found a workaround:

  1. Export dataset as jsonl file
  2. Delete specific row out of file
  3. Delete dataset and create it again with "db_in" and the new jsonl file
  4. Start browser application, then the specific document should appear again

However, I think this is not a nice way to do it. A recipe which makes it possible to adjust existing labels would be great!

Hi! Yes, this definitely works :slight_smile:

Alternatively, if you want to go through all of your annotations and adjust them later (e.g. if you changed your label scheme or just want to re-annotate), you can also set dataset:name_of_your_dataset as the input source instead of a file, and the dataset will be queued up again. For example:

prodigy rel.manual new_dataset blank:en dataset:your_previous_dataset ...

Just make sure to save the result to a new dataset so you don't end up with duplicates in the same set.

1 Like

Hi @ines ,

thank you very much for your answer!
That's a really useful hint :slightly_smiling_face:

1 Like