feature request: labelling speed

This a feature request for a ‘nice-to-have’ as this is totally not the most important thing out there.

It might be nice to have a sort of estimate for number of labels per minute that can be generated. It helps to know how much time might yield how many labels. This is very similar to the tqdm package in python.

This could even be used to generate an ETA, but obviously only on datasets that are finite.

Yes, I love that idea! We’ve been doing this manually sometimes and then calculating the average seconds per annotation. It’s actually pretty motivating and I’m still surprised how fast some of the annotation can get once you’re in a good flow :smiley:

We could make this an optional feature and maybe even display it in the UI underneath the progress or something. It won’t have to be updated in real time and could just be returned by the REST API periodically, just like the progress.

In the meantime, you could probably implement a simple version of this via a the update method in a custom recipe – or just on a per-session basis in the on_load and on_exit callbacks. So you log the time on load, calculate the difference and then use the session_annotated attribute of the controller to get the total number of annotations in that session. Something like this:

def on_exit(ctrl):
    total_mins = end_time - start_time
    count = ctrl.session_annotated
    print(total_mins * 60 / count, "seconds per annotation")
    print(count / total_mins, "annotations per minute")

The controller also gives you access to the database, so if you want even more detailed stats, you could fetch all annotations for the current session_id dataset, look at their answers, the total number of spans etc (depending on the task) and calculate stats from that.

2 Likes