Hi, i used ner.manual to annotate a dataset from scratch with multiple labels. My question: is it possible to add new subtags for each label already annotated using the exesting pre-annotated dataset? For example i have: Label1, Label2 and Label3, and I want to add more sub-labels for each label.
Labeling Label1: Label1_1, Label1_2 and Label1_3 and do the same for the other existing labels.
Hi! By sub-label, do you mean, hierarchical categories? For example, if you have the label LOCATION, annotate whether the entity is LOCATION_CITY or LOCATION_COUNTRY, etc.? If so, one workflow could be to stream in your examples again with one entity at a time, and add multiple-choice options for the sub-labels. Then, all the annotator has to focus on is a single mention and a subset of sub-labels, so it should be really quick to annotate (and easy to evaluate, in case there are conflicts and disagreements).
To implement this, you could use a custom interface with two blocks: ner (to render the entity) and choice (for the options). The stream could look something like this:
options = [{"id": "LOCATION_CITY", "text": "LOCATION > CITY"}] # etc.
def get_stream(stream):
for eg in stream:
for span in eg.get("spans", []): # one example by annotated span
yield {"text": eg["text"], "spans": [span], "options": options}
And then your blocks could look like this:
blocks = [
{"view_id": "ner_manual"},
{"view_id": "choice", "text": None} # prevent text from being shown in both UIs
]
Prodigy should be able to do this automatically when you train or run data-to-spacy, since all your examples have the same text, but different spans. When the data is merged, all annotations on the same text are merged into a single example.
Just make sure you use a new dataset for the sub-labels so there's no conflict (like, a span annotated with both LOCATION and CITY). Each token can only be part of one span.
In tihs case, you probably want to re-train from scratch – otherwise, you're trying to teach your model a completely new definition of what it previously predicted, which likely won't be very effective.