Renaming labels in NER

atakanokan · July 6, 2020, 10:20pm

Hi,

I annotated a large amount of documents with very granular labels. Now, I want to merge some very granular labels in its "parent" label. Is there a way to edit label names in prodigy? I thought of a way to do this (stated below), but wanted to know if it can be done in Prodigy.

Method (prodigy->python->prodigy): Get the output of the database, edit "spans" in the output jsonl file via python. Read the jsonl as a new database into Prodigy.

Any help or recommendation is welcomed! Thanks!

ines · July 7, 2020, 9:07am

Hi! That sounds like a reasonable approach to me Datasets in Prodigy are append-only by design, so you'd probably want to create a new dataset for your edited annotations. And if it turns out you want to go back to the previous state, you still have the old dataset available.

(If you're a jq wizard, you could probably write a command-line one-liner to do the data transformation and then pipe the result forward to a new dataset, all in one step. But I couldn't tell you how )

atakanokan · September 2, 2020, 6:52pm

@ines there might be a bug in this process.

Here was my workflow:

prodigy db-out dataset_granular_labels > granular_labels.jsonl
In python, change "label" value to the less granular value for all annotations under "spans".
In python, save it as less_granular.jsonl
- Checked whether less_granular.jsonl contained the changed labels and it does.
prodigy db-in dataset_lessgranular less_granular.jsonl

However when I do prodigy review dataset_lessgranular_reviewed dataset_lessgranular --label labels_less_granular.txt, I see the less granular labels (what I changed them to) on the top in the label choice section, but the existing labels (highlighted yellow in the text) are still the old ones.

Any recommendation would be appreciated!

============================== ✨  Prodigy Stats ==============================

Version          1.9.9
Platform         Windows-10-10.0.18362-SP0
Python Version   3.6.2
Database Name    SQLite
Database Id      sqlite

ines · September 3, 2020, 3:00pm

Hmm, there's very little magic going on here and the review recipe should just show you whatever is in that datast When you look at what's in your dataset_lessgranular dataset (e.g. using db-out), which labels do you see here and how many examples are in there? Maybe it somehow ended up with a copy of the previous unedited data?

atakanokan · September 3, 2020, 3:33pm

less_granular.jsonl: has the edited (less granular) labels. This is the altered db-out via python script.
After db-ining less_granular.jsonl, I db-outed it again as less_granular_prodigy.jsonl. I see the less granular version of the labels here too (under the main "spans"). However, under "versions", I see other "spans", that contain the granular labels. To visualize for one document:

{"text" : "..."
...
"tokens": [...]
"spans": [ LESS GRANULAR LABELS SEEN HERE ] 
"versions": [
       {"text" : "..."
         ...
         "tokens": [...]
         "spans": [ GRANULAR LABELS SEEN HERE ] 
         "versions": [
                {"text" : "..."
                  ...
                  "tokens": [...]
                  "spans": [ GRANULAR LABELS SEEN HERE TOO ] 
                }
       }
]
}

bob_ln · November 17, 2021, 7:54pm

Late to the party, but I've successfully changed labels just using sed.

sed 's,"label":"OLD_LABEL","label":"NEW_LABEL",g' old_task.jsonl > new_task.jsonl

rahul1 · November 15, 2022, 2:18pm

After party attendee here.
I am in similar situation (changing labels). I donot have background in sed jq but I will look into the code posted by you. If you can give still more information about your code, it is highly appreciated.
gr.
Rahul

Thanks @bob_ln for the code . It works for my case. All I had to do in terminal is install jq (sudo apt-get install jq) and use your code to change the labels.

Topic		Replies	Views
Is there a way to reduce the number of labels in the annotated dataset of prodi.gy? ner	2	259	December 5, 2022
"evolving" an annotation dataset by adding labels? solved	2	223	October 30, 2023
changing annotations in DB via the interface usage , ner , front-end	2	1180	December 12, 2019
Merging annotations from different datasets usage , ner , database , solved	12	5874	May 28, 2019
Re-labeling usage , ner , solved	1	578	May 17, 2018

Renaming labels in NER

Related topics