Is there a way to reduce the number of labels in the annotated dataset of prodi.gy?

Hi,

My annotated data contains 24 labels, which I think are a lot. I would like to reduce the number labels, for example, change labels 'grant' and 'prize' to label 'award'. Another example would be to merge 'event', 'activity', 'output' 'product' to 'product'.

I used db-out command to export json file of the annotated data. I have looked on this forum to figure out next step of reducing the number of labels, but could not find it in recent posts.

regards
Rahul

Deciding on the labels of interest tends to be a manual process. That said, when you export your data nothing is stopping you from using a Python script to replace some of the labels.

The easiest way might be to use the Python API. Have you seen this method? You could also write a script that takes examples from the "many labels" dataset, changes it, and then tells Prodigy to create a new "few labels" dataset so that you may proceed annotating from there.

Advice

Before writing the script, I might advice having a meeting with some team members, annotators and other stakeholders to double-check if there's a clear consensus on what labels are of interest. It'd be a shame to rewrite a script, which is one reason to do this. But this might also be a great opportunity to understand the problem your team is trying to solve a bit better. It might also lead to an improved annotation instructions.

1 Like

Renaming the labels in NER

If you are not satisfied with the labels you gave during annotation, you can change via following process.

My annotated data contains 24 labels, which I think are a lot. I would like to reduce the number labels, for example, change labels ‘grant’ and ‘prize’ to label ‘award’. Another example would be to merge ‘event’, ‘activity’, ‘output’ ‘product’ to ‘product’.

Step: export the annotated dataset as jsonl file by using following command in the terminal

prodigy db-out dataset > ./data.jsonl

Step: install jq via the terminal with code

sudo apt-get install jq

Step: use following code to change the labels

sed 's,"label":"OLD_LABEL","label":"NEW_LABEL",g' old_data.jsonl > new_data.jsonl

You can reintegrate this new data jsonl file in the dataset of your prodigy database by command db-in:

prodigy db-in new_dataset ./new_data.jsonl --rehash

1 Like