hi @do12siwu!
No problem at all! We're all learning and you're doing great
Oh - so is your LABEL
is named REGEX
?
If so, you can then use this code:
from prodigy.components.db import connect
from collections import defaultdict
db = connect()
examples = db.get_dataset("koText")
label_stats = defaultdict(list)
for eg in examples:
if eg["answer"] == "accept": # you probably want to exclude ignored/rejected answers?
for span in eg.get("spans", []):
word = eg["text"][span["start"]:span["end"]] # slice of the text
label = span["label"]
if label == "REGEX": # only keep those spans with "REGEX" labels
label_stats[label].append(word)
print(label_stats["REGEX"])
This will put into a list all of your annotated spans with the label "REGEX"
. Does this solve your problem?