I annotated a large amount of documents with very granular labels. Now, I want to merge some very granular labels in its "parent" label. Is there a way to edit label names in prodigy? I thought of a way to do this (stated below), but wanted to know if it can be done in Prodigy.
Method (prodigy->python->prodigy): Get the output of the database, edit "spans" in the output jsonl file via python. Read the jsonl as a new database into Prodigy.
Hi! That sounds like a reasonable approach to me Datasets in Prodigy are append-only by design, so you'd probably want to create a new dataset for your edited annotations. And if it turns out you want to go back to the previous state, you still have the old dataset available.
(If you're a jq wizard, you could probably write a command-line one-liner to do the data transformation and then pipe the result forward to a new dataset, all in one step. But I couldn't tell you how )
However when I do prodigy review dataset_lessgranular_reviewed dataset_lessgranular --label labels_less_granular.txt, I see the less granular labels (what I changed them to) on the top in the label choice section, but the existing labels (highlighted yellow in the text) are still the old ones.
Any recommendation would be appreciated!
============================== ✨ Prodigy Stats ==============================
Version 1.9.9
Platform Windows-10-10.0.18362-SP0
Python Version 3.6.2
Database Name SQLite
Database Id sqlite
Hmm, there's very little magic going on here and the review recipe should just show you whatever is in that datast When you look at what's in your dataset_lessgranular dataset (e.g. using db-out), which labels do you see here and how many examples are in there? Maybe it somehow ended up with a copy of the previous unedited data?
less_granular.jsonl: has the edited (less granular) labels. This is the altered db-out via python script.
After db-ining less_granular.jsonl, I db-outed it again as less_granular_prodigy.jsonl. I see the less granular version of the labels here too (under the main "spans"). However, under "versions", I see other "spans", that contain the granular labels. To visualize for one document:
{"text" : "..."
...
"tokens": [...]
"spans": [ LESS GRANULAR LABELS SEEN HERE ]
"versions": [
{"text" : "..."
...
"tokens": [...]
"spans": [ GRANULAR LABELS SEEN HERE ]
"versions": [
{"text" : "..."
...
"tokens": [...]
"spans": [ GRANULAR LABELS SEEN HERE TOO ]
}
}
]
}
After party attendee here.
I am in similar situation (changing labels). I donot have background in sed jq but I will look into the code posted by you. If you can give still more information about your code, it is highly appreciated.
gr.
Rahul
Thanks @bob_ln for the code . It works for my case. All I had to do in terminal is install jq (sudo apt-get install jq) and use your code to change the labels.