Export Annotated Data from ner.manual to get list of words per label

ryanwesslen · September 7, 2022, 3:48pm

No problem at all! We're all learning and you're doing great

Oh - so is your LABEL is named REGEX?

If so, you can then use this code:

from prodigy.components.db import connect
from collections import defaultdict

db = connect()
examples = db.get_dataset("koText")

label_stats = defaultdict(list)
for eg in examples:
    if eg["answer"] == "accept":  # you probably want to exclude ignored/rejected answers?
        for span in eg.get("spans", []):
            word = eg["text"][span["start"]:span["end"]]  # slice of the text
            label = span["label"]
            if label == "REGEX": # only keep those spans with "REGEX" labels
                label_stats[label].append(word)

print(label_stats["REGEX"])

This will put into a list all of your annotated spans with the label "REGEX". Does this solve your problem?

Topic		Replies	Views
Getting Started Questions usage , ner	1	626	November 6, 2018
Training on part of the custom annotations usage , ner , database	4	675	October 22, 2021
Processing annotated data usage , ner	1	310	January 20, 2022
Exported annotations missing text ner	2	225	November 10, 2022
Re-labling custom dataset with Prodigy usage , ner	2	606	June 28, 2021

Export Annotated Data from ner.manual to get list of words per label

Related topics