Currently the UI shows the annotation progress quite generally - in terms of "TOTAL" examples annotated so far. Is it possible to somehow customize the UI so that the progress status will be more elaborate? For example, I find it useful to know how many examples were accepted, how many were rejected, and how many were ignored.
Yes, for that you can set
"show_stats": true in your
prodigy.json – this will show you a visual and list breakdown of the accepted, rejected and ignored answers. See here for an example: https://prodi.gy/docs/api-web-app#statistics
Is there a way where we can look at the count of the manual labels that have been assigned to each task? That makes it easier for us to look at how balanced the dataset is. For instance if we have Label_1, Label_2 and Label_3. Can we look at how many records have been labelled under each label?
Label_1 - 23
Label_2 - 15
Label_3 - 33
Rejected - 50
This will make it easier for us to decide on when to stop labelled or if we need to do something to boost a particular label's sample.
Thanks for your support.
In that case, it's probably easiest to do this as a separate script that connects to the database and outputs the counts. Where the label is and what it "means" for the given annotation task depends on the type of task (NER, text classification) and the interface you use (e.g.
For example, if you're annotating named entities, you could do the following:
from prodigy.components.db import connect from collections import Counter db = connect() examples = db.get_dataset("your_dataset") counts = Counter() for eg in examples: if eg["answer"] == "reject": counts["rejected"] += 1 elif eg["answer"] == "accept": for span in eg.get("spans", ): counts[span["label"]] += 1 print(counts)
If you're annotating data for text classification, you could either look at the
eg["label"] (for binary annotations) or at
eg["accept"] (multiple-choice annotations) for the labels.
You can also run the script while the Prodigy server is still running and it'll always pull the latest state from the database.