order of tasks dependent on annotations

phtephanx · June 23, 2021, 11:10am

Hi!

I'd like to change the order of tasks dynamically dependent on the received annotations.
For instance, if annotator rates current task "a" in specific way, then present "c" as next task instead of task "b".
The callback function update could be a start but it doesn't have access to the stream object.

Thanks a lot,
Stephan

ines · June 25, 2021, 2:52am

Hi! The stream generator and update callback are consumed/called at different points, but they can share state, e.g. via a global or nonlocal variable. This is also how the active learning works: the update callback updates the model, which is then used to process the next batch of the stream (which is only consumed/executed on a single batch at a time).

So in your case, you could make your update callback store the relevant state you're interested in (for instance, the annotated labels in the batch) in a variable, and then use that in the stream to determine the order of your examples.

One thing to keep in mind here is the annotation flow: while you can use a batch size of 1 or set "instant_submit": true to submit an example as it's annotated, you also want to make sure that the annotator doesn't have to wait around for the next example, or allow them to undo their previous annotation if they made a mistake. Prodigy will try to always queue up enough examples in the background so the annotator never runs out or has to wait. So by default, it'll already ask your stream for the next batch while the current batch is sent back to the server and processed on the back-end.

So if possible, it's usually better to work with batches of multiple examples so you can queue up the next batch while the annotator is still busy. It also means you don't run out if your update callback takes a while to process. It also means that it's possible to undo and correct a mistake: the previous batch is kept on the client, so the annotator can go back and forth (and you don't have to reconcile conflicts on the server). So you could ask about a batch at "level 1" first, then use it to queue up the annotations for "level 2" while the annotator works on the next batch, then have them work on the "level 2" annotations, and so on.

phtephanx · June 26, 2021, 1:16pm

Hi! The stream generator and update callback are consumed/called at different points, but they can share state, e.g. via a global or nonlocal variable. This is also how the active learning works: the update callback updates the model, which is then used to process the next batch of the stream (which is only consumed/executed on a single batch at a time).

Totally makes sense, thanks, and works with my intended logic.

One thing to keep in mind here is the annotation flow: while you can use a batch size of 1 or set "instant_submit": true to submit an example as it's annotated, you also want to make sure that the annotator doesn't have to wait around for the next example, or allow them to undo their previous annotation if they made a mistake. Prodigy will try to always queue up enough examples in the background so the annotator never runs out or has to wait. So by default, it'll already ask your stream for the next batch while the current batch is sent back to the server and processed on the back-end.

Thanks for the hint. For my use case, it depends on whether I can parallelize the logic of the stream manipulation.

ines · June 28, 2021, 2:39am

Just make sure you're not starting multiple threads in the stream, because this can easily lead to problems and doesn't play well with the generator.

If you're okay with using small batches instead of single questions, this would probably be the most straightforward approach. Let's say you have a task where you're asking questions about a text and the first one is "Is this about sports?". If the answer is yes, you want to be asking "Is this about football?". You start off by sending out 2 questions about sports, and while the user annotates the very first batch, Prodigy will ask for the next in the background. You'll first get 2 answers about sports back, and your update callback can queue up the 2 follow-up questions about football, while the annotator works on the second batch of sports. Next, they get the follow-up questions about football, while you queue up the follow-up for the second batch, and so on. If there are no follow-up questions available (either because they're not ready yet or because there are none left), you send out new sports questions. This way, you never run out. You could even sort the follow-up questions so you only ask about football once you have 10 questions together.

Topic		Replies	Views
Understanding the limitations of non-independent/dynamic annotation tasks in Prodigy usage , streams , server	1	667	June 17, 2020
change the sample order usage , solved , streams	3	443	January 7, 2022
End of task hit when many task left usage , streams	5	556	March 26, 2020
Give a specific order of presentation depending on the annotator usage , solved , streams	4	702	February 10, 2021
"Refreshing" the stream of examples usage , solved	6	1799	October 23, 2018

order of tasks dependent on annotations

Related topics