Hi,
Currently the UI shows the annotation progress quite generally - in terms of "TOTAL" examples annotated so far. Is it possible to somehow customize the UI so that the progress status will be more elaborate? For example, I find it useful to know how many examples were accepted, how many were rejected, and how many were ignored.
Oren
Yes, for that you can set "show_stats": true
in your prodigy.json
– this will show you a visual and list breakdown of the accepted, rejected and ignored answers. See here for an example: https://prodi.gy/docs/api-web-app#statistics
Hi.
Is there a way where we can look at the count of the manual labels that have been assigned to each task? That makes it easier for us to look at how balanced the dataset is. For instance if we have Label_1, Label_2 and Label_3. Can we look at how many records have been labelled under each label?
Label_1 - 23
Label_2 - 15
Label_3 - 33
Rejected - 50
This will make it easier for us to decide on when to stop labelled or if we need to do something to boost a particular label's sample.
Thanks for your support.
In that case, it's probably easiest to do this as a separate script that connects to the database and outputs the counts. Where the label is and what it "means" for the given annotation task depends on the type of task (NER, text classification) and the interface you use (e.g. classification
vs. choice
).
For example, if you're annotating named entities, you could do the following:
from prodigy.components.db import connect
from collections import Counter
db = connect()
examples = db.get_dataset("your_dataset")
counts = Counter()
for eg in examples:
if eg["answer"] == "reject":
counts["rejected"] += 1
elif eg["answer"] == "accept":
for span in eg.get("spans", []):
counts[span["label"]] += 1
print(counts)
If you're annotating data for text classification, you could either look at the eg["label"]
(for binary annotations) or at eg["accept"]
(multiple-choice annotations) for the labels.
You can also run the script while the Prodigy server is still running and it'll always pull the latest state from the database.