My annotated data contains 24 labels, which I think are a lot. I would like to reduce the number labels, for example, change labels 'grant' and 'prize' to label 'award'. Another example would be to merge 'event', 'activity', 'output' 'product' to 'product'.
I used db-out command to export json file of the annotated data. I have looked on this forum to figure out next step of reducing the number of labels, but could not find it in recent posts.
Deciding on the labels of interest tends to be a manual process. That said, when you export your data nothing is stopping you from using a Python script to replace some of the labels.
The easiest way might be to use the Python API. Have you seen this method? You could also write a script that takes examples from the "many labels" dataset, changes it, and then tells Prodigy to create a new "few labels" dataset so that you may proceed annotating from there.
Before writing the script, I might advice having a meeting with some team members, annotators and other stakeholders to double-check if there's a clear consensus on what labels are of interest. It'd be a shame to rewrite a script, which is one reason to do this. But this might also be a great opportunity to understand the problem your team is trying to solve a bit better. It might also lead to an improved annotation instructions.
Renaming the labels in NER
If you are not satisfied with the labels you gave during annotation, you can change via following process.
My annotated data contains 24 labels, which I think are a lot. I would like to reduce the number labels, for example, change labels ‘grant’ and ‘prize’ to label ‘award’. Another example would be to merge ‘event’, ‘activity’, ‘output’ ‘product’ to ‘product’.
Step: export the annotated dataset as jsonl file by using following command in the terminal
prodigy db-out dataset > ./data.jsonl
Step: install jq via the terminal with code
sudo apt-get install jq
Step: use following code to change the labels
sed 's,"label":"OLD_LABEL","label":"NEW_LABEL",g' old_data.jsonl > new_data.jsonl
You can reintegrate this new data jsonl file in the dataset of your prodigy database by command db-in:
prodigy db-in new_dataset ./new_data.jsonl --rehash