Annotators Performance Tracker

In your case, wouldn't it be easier to run such a script on the annotated dataset instead?

prodigy db-out dataset > out.jsonl

Here's an example file I have locally with annotators.

{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777124,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent"}
{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777125,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent"}
{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777134,"_annotator_id":"issue-6044-jimmy","_session_id":"issue-6044-jimmy"}
{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777134,"_annotator_id":"issue-6044-jimmy","_session_id":"issue-6044-jimmy"}
{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777142,"_annotator_id":"issue-6044-lechuck","_session_id":"issue-6044-lechuck"}
{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777143,"_annotator_id":"issue-6044-lechuck","_session_id":"issue-6044-lechuck"}
{"text":"brussel sprouts are amazing","_input_hash":564254940,"_task_hash":-321962903,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777527,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent"}
{"text":"brussel sprouts are amazing","_input_hash":564254940,"_task_hash":-321962903,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777537,"_annotator_id":"issue-6044-jimmy","_session_id":"issue-6044-jimmy"}
{"text":"brussel sprouts are amazing","_input_hash":564254940,"_task_hash":-321962903,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777544,"_annotator_id":"issue-6044-lechuck","_session_id":"issue-6044-lechuck"}
{"text":"it is cold today","_input_hash":718077657,"_task_hash":-363462449,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666878566,"_annotator_id":"issue-6044-guybrush","_session_id":"issue-6044-guybrush"}
{"text":"a wood chuck could chuck a lot of wood if a wood chuck could chuck wood","_input_hash":-1690856185,"_task_hash":1885086500,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666878830,"_annotator_id":"issue-6044-guybrush","_session_id":"issue-6044-guybrush"}

Here's a pandas script that can take such a file and round per hour.

import pandas as pd 

(pd.read_json("out.jsonl", lines=True)
    .assign(dt=lambda d: pd.to_datetime(d['_timestamp'], unit="s").round("H"))
    .groupby("dt")
    .agg(n_text=("_input_hash", "nunique"),
         n_annot=("_annotator_id", "nunique"), 
         n_examples=("_annotator_id", "size")))

Here's the output.

                     n_text  n_annot  n_examples
dt                                              
2022-10-26 10:00:00       3        3           9
2022-10-27 14:00:00       2        1           2

You can customize such a pandas query to your hearts content, but I can imagine that running something like that as a script is the most flexible. Maybe even turn it into a streamlit app or something?

2 Likes