is it possible to export the data from a ner.manual annotation project to get each word which was annotaed with the certain label as dictionary (Key: label, Value: List of annotated Words of this label).
I would like to check if the words which were annotated using prodigy are according to our annotation guideline per label to ensure a. high quality.
Prodigy lets you interact with the database and annotations from Python, so you can write any custom logic that goes over your annotations and compiles stats about them. You can find an example of the JSON format for named entities here: https://prodi.gy/docs/api-interfaces#ner_manual As you can see, this has all the info you need: a list of annotated "spans" containing the start and end character offset of the annotated word, and the associated label. So you could do something like this:
from prodigy.components.db import connect
from collections import defaultdict
db = connect()
examples = db.get_dataset("name_of_your_dataset")
label_stats = defaultdict(list)
for eg in examples:
if eg["answer"] == "accept": # you probably want to exclude ignored/rejected answers?
for span in eg.get("spans", []):
word = eg["text"][span["start"]:span["end"]] # slice of the text
label = span["label"]
label_stats[label].append(word)
print(label_stats)