Export Annotated Data from ner.manual to get list of words per label

hi @do12siwu!

No problem at all! We're all learning and you're doing great :slight_smile:

Oh - so is your LABEL is named REGEX?

If so, you can then use this code:

from prodigy.components.db import connect
from collections import defaultdict

db = connect()
examples = db.get_dataset("koText")

label_stats = defaultdict(list)
for eg in examples:
    if eg["answer"] == "accept":  # you probably want to exclude ignored/rejected answers?
        for span in eg.get("spans", []):
            word = eg["text"][span["start"]:span["end"]]  # slice of the text
            label = span["label"]
            if label == "REGEX": # only keep those spans with "REGEX" labels
                label_stats[label].append(word)

print(label_stats["REGEX"])

This will put into a list all of your annotated spans with the label "REGEX". Does this solve your problem?