Sub-label the existing labels

hassnabou · October 20, 2020, 9:50am

Hi, i used ner.manual to annotate a dataset from scratch with multiple labels. My question: is it possible to add new subtags for each label already annotated using the exesting pre-annotated dataset? For example i have: Label1, Label2 and Label3, and I want to add more sub-labels for each label.
Labeling Label1: Label1_1, Label1_2 and Label1_3 and do the same for the other existing labels.

Thank you.

ines · October 21, 2020, 9:36am

Hi! By sub-label, do you mean, hierarchical categories? For example, if you have the label LOCATION, annotate whether the entity is LOCATION_CITY or LOCATION_COUNTRY, etc.? If so, one workflow could be to stream in your examples again with one entity at a time, and add multiple-choice options for the sub-labels. Then, all the annotator has to focus on is a single mention and a subset of sub-labels, so it should be really quick to annotate (and easy to evaluate, in case there are conflicts and disagreements).

To implement this, you could use a custom interface with two blocks: ner (to render the entity) and choice (for the options). The stream could look something like this:

options = [{"id": "LOCATION_CITY", "text": "LOCATION > CITY"}]  # etc.

def get_stream(stream):
    for eg in stream:
        for span in eg.get("spans", []):  # one example by annotated span
            yield {"text": eg["text"], "spans": [span], "options": options}

And then your blocks could look like this:

blocks = [
    {"view_id": "ner_manual"},
    {"view_id": "choice", "text": None}  # prevent text from being shown in both UIs
]

hassnabou · October 22, 2020, 2:40pm

Thank you for the reply. Is it possible to improve and train the new dataset after this step? and how to merge all datasets of each label in one?

ines · October 22, 2020, 6:11pm

Prodigy should be able to do this automatically when you train or run data-to-spacy, since all your examples have the same text, but different spans. When the data is merged, all annotations on the same text are merged into a single example.

Just make sure you use a new dataset for the sub-labels so there's no conflict (like, a span annotated with both LOCATION and CITY). Each token can only be part of one span.

In tihs case, you probably want to re-train from scratch – otherwise, you're trying to teach your model a completely new definition of what it previously predicted, which likely won't be very effective.

Topic		Replies	Views
Adding sub-labels to an existing labels applied to existing annotation dataset	2	346	November 29, 2022
Annotate same text with different label usage , ner , solved , streams	1	405	March 6, 2022
Merging single label-based models into one multiple label-model usage , ner , solved	3	1080	June 10, 2020
Multi-label NER usage , ner	1	1634	April 25, 2021
add new lables as per new data received to existing data set and retrain the NER model ner , spacy	7	916	September 7, 2022

Sub-label the existing labels

Related topics