Adding new label

Hi!

I am fairly new to Prodigy, so i am trying some things out. I know that you suggest (and for good reason) to have a proper labelling scheme before starting annotating. I started doing some annotations using Prodigy, but due to inexperience and not fully understanding the problem, i discovered that i need to add an extra label (and in the future, it is likely that this situation may occur again). The new label is distinct from existing ones, so no overlapping or corrections with the others are required.

I wanted to ask what is the best way to move forward. Is there a way to use ner.manual or ner.correct on the saved labelled dataset and add the new label? Ideally, i would like to have the saved named entities displayed, so i can focus on adding only the new label. So using your example in the image below, if i originally had only the label "PERSON" and tagged "Zuckerberg", i would like to use ner.manual/correct on the already annotated dataset and display the following, so i have to add the "ORG". Just to be clear the "PERSON" tags would not be model predictions or patterns, but the previously saved annotations.

Similarly, i would like to ask what you would advise in the case of wanting to remove a label from the existing scheme.

Thanks a lot for your continuous support!

1 Like

Hi @pkras!

Is there a way to use ner.manual or ner.correct on the saved labelled dataset and add the new label?

It's possible to load existing datasets again. Prodigy will load the anotations from the dataset then stream them back in. Perhaps you can try that then include your new label:

prodigy ner.manual <new_dataset> <model> dataset:<old_dataset> --label PERSON,ORG

Similarly, i would like to ask what you would advise in the case of wanting to remove a label from the existing scheme.

Do you mean removing an annotation or the "choices"? If it's the former, you can remove the previous annotation by clicking over the token. If the latter, you can just whitelist what you need in --label and they'll be the ones to show up later on.

Hope it helps!

1 Like

Hey @ljvmiranda921,

Thanks, that was what i needed!

For removing annotations, i meant it as a whole from the label scheme, e.g. remove "ORG" label entirely from all data, so you can retrain the dataset from scratch without that label, but without doing more annotations.

Thank you for your help!

Hey @pkras

What might help is to connect to the database programmatically using Python, and work on the label/s you're interested in. With that you can run your experiments separately. For example:

from prodigy.components.db import connect
db = connect()
examples = db.get_dataset("your_dataset_with_all_labels")

new_examples = []
labels_to_keep = ["A", "B", "C"]  # you can whitelist or blacklist

for eg in examples:
    # you can whitelist or blacklist
    spans = [span for span in eg.get("spans", []) if span["label"] in labels_to_keep]
    eg["spans"] = spans
    new_examples.append(eg)

db.add_dataset("your_new_dataset")
db.add_examples(new_examples, ["your_new_dataset"])

You can check this related thread for more info :smiley:

1 Like

I just tested it and it works as intended! It is a simple and elegant solution.
I would like to suggest maybe for future additions to Prodigy, for a functionality to be added to let you choose on which labels to train. For example something like:
prodigy train myModel --ner myDataset --config myConfig.cfg --base-model baseModel --labels A,B,C --label-stats
I believe this could help in quickly finding out the effects of removing/adding various labels in training our data.

Thank you as always for your help and support!

I just tested it and it works as intended! It is a simple and elegant solution.

Glad it helped!

I would like to suggest maybe for future additions to Prodigy, for a functionality to be added to let you choose on which labels to train. For example something like:
prodigy train myModel --ner myDataset --config myConfig.cfg --base-model baseModel --labels A,B,C --label-stats

We can consider this! One difficulty we found in a general-purpose solution is that "label" can mean different things for each task, and that annotators may want different combinations. It's an interesting use-case for sure, you can check this thread for a similar discussion: Training on part of the custom annotations

1 Like