Functionality for more of a regression (i.e., discrete) based outcome

I’m intrigued by this tool, however, the models I typically develop are more fine-grained such that I’m looking to predict an outcome on more of a Likert scale (1-5). Is there functionality for such a use case? I didn’t see anything especially relevant in the demo.

I’m also interested if there is an example workflow where one individual could put together labels and this could be compared to another individual’s labels, that is, any way to streamline the estimates of inter-rater agreement/reliability (e.g., intraclass coefficients ICC(1), quadratic weighted kappas).

Nice looking tool,


You could use a Likert scale, yes – it’s possible to customise the questions quite a lot, and it’s pretty easy to just have a choice of the five options for Likert questions.

Not knowing much about your task, this might not apply well, but I’m often encouraging people to try A/B evaluations instead of Likert annotations. For things like similarity judgments, naturalness evaluations, translation quality etc, A/B evaluations are often much easier to work with, and more robust, as you don’t have to worry about the calibration of different annotators (or even the self-calibration of a single annotator across a session).

More generally, the tool’s designed to be easily scriptable, so I think you’ll find it easy to put together the workflow you have in mind. You can find examples of the recipes scripts here:

Thank you sir for your response. Would it be possible to be able to use the tool for something along the lines of a 4 or 6 week evaluation period to see if I can build a business case for the tool? I’d like to demonstrate the number of cases needed to label using prodigy compared to the old-fashioned label 500 and get back to me approach. If I could compare the accuracy of the predictive model at certain checkpoints (i.e., increments of 10) and when the number needed starts to asymptote, I think that would be very compelling. Is it 75% less work? I’m seeing in other resources an active machine learning approach can be that efficient so I’m very interested.

Another interesting use case might be to more simply have the tool document the optimal order of cases to rate for one individual which could be followed by the other individual. Not ideal, but could still provide a lot of value if we’re going from 500 to 125 (approx. 25% work).

If you’re a university researcher, you could apply for a trial research license here: .

For commercial use-cases, we usually prefer to run shorter trials, by launching a VM for you with everything installed. If that sounds more applicable, I’d be happy to get you set up — just shoot me an email here:

Thank you for the quick response. I’m not an academic so I’ll need to line things up on my end. Thanks again.

Hello Sir,

I sent you an email and excited to test it out.