Ranking & Ordering texts with Prodigy

Hi

I'm trying to rank texts, e.g. with 5 texts, can I order them 1,2,3,4,5?

Is it possible to have a MCQ component but instead of multi-select, would it be possible to instead add numbers to the boxes, giving them some order or rank?

The two usecases are:

  1. Event ordering: which event happened first
  2. Information retrieval, ie which retrieved document title is most relevant to query.

Thanks and regards,
Jeremy

PS I've seen this thread below, but I don't think the solutions are super elegant for my uses.

Hi and sorry for only getting to this now, I somehow missed the thread!

I can't immediately think of a way to implement the ranking with one of the built-in interfaces, although you could definitely add a feature like this using custom HTML and a bit of JavaScript: Custom Interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP

One option would be to show all your texts with a simple <input type="number"> to specify the rank (with a min/max value based on the number of options). Each input can have an event listener that fires on change, i.e. when the user changes the number, and adds the information to the task JSON with window.prodigy.update. If you wanted to restrict it so that one number can only be present once, this would be a bit more involved to implement on the client – but you could also just use the validate_answer callback for that, check the added ranks in Python and raise an error if a value is present more than once.

If you want the UI to be more in a drag-and-drop style, you could adapt something like this: https://codepen.io/crouchingtigerhiddenadam/pen/qKXgap The minimum code needed is actually very straightforward and you could almost copy-paste it directly into your "javascript". You'd just need to add a call to window.prodigy.update after each drag event (dragEnd) that checks the order of elements and updates the JSON accordingly. Each list element could have an id that corresponds to the document title ID in your data, so you could just loop over the list elements in order to extract the rank. So if the first element has document title ID 123, you know that title 123 was ranked 1 (or 0, depending on how you index).

This is probably something you could do with the choice UI, the query as the main content and options for the document titles. You could then select one (or more) most relevant queries. Framing this as a choice problem might be very helpful here, because in cases like this, you realistically only care about the top 1 or 2 selections – you don't necessarily want annotators to waste time on deciding whether a title should rank 9th or 10th, and you'd likely end up with a lot of disagreements here anyways. If you already have a model you could also let it pre-select the most relevant option(s), so you can calculate how often the annotator agrees with the model, or how often different annotators agree with each other.

2 Likes