order of tasks dependent on annotations

Hi!

I'd like to change the order of tasks dynamically dependent on the received annotations.
For instance, if annotator rates current task "a" in specific way, then present "c" as next task instead of task "b".
The callback function update could be a start but it doesn't have access to the stream object.

Thanks a lot,
Stephan

Hi! The stream generator and update callback are consumed/called at different points, but they can share state, e.g. via a global or nonlocal variable. This is also how the active learning works: the update callback updates the model, which is then used to process the next batch of the stream (which is only consumed/executed on a single batch at a time).

So in your case, you could make your update callback store the relevant state you're interested in (for instance, the annotated labels in the batch) in a variable, and then use that in the stream to determine the order of your examples.

One thing to keep in mind here is the annotation flow: while you can use a batch size of 1 or set "instant_submit": true to submit an example as it's annotated, you also want to make sure that the annotator doesn't have to wait around for the next example, or allow them to undo their previous annotation if they made a mistake. Prodigy will try to always queue up enough examples in the background so the annotator never runs out or has to wait. So by default, it'll already ask your stream for the next batch while the current batch is sent back to the server and processed on the back-end.

So if possible, it's usually better to work with batches of multiple examples so you can queue up the next batch while the annotator is still busy. It also means you don't run out if your update callback takes a while to process. It also means that it's possible to undo and correct a mistake: the previous batch is kept on the client, so the annotator can go back and forth (and you don't have to reconcile conflicts on the server). So you could ask about a batch at "level 1" first, then use it to queue up the annotations for "level 2" while the annotator works on the next batch, then have them work on the "level 2" annotations, and so on.

Hi! The stream generator and update callback are consumed/called at different points, but they can share state, e.g. via a global or nonlocal variable. This is also how the active learning works: the update callback updates the model, which is then used to process the next batch of the stream (which is only consumed/executed on a single batch at a time).

Totally makes sense, thanks, and works with my intended logic.

One thing to keep in mind here is the annotation flow: while you can use a batch size of 1 or set "instant_submit": true to submit an example as it's annotated, you also want to make sure that the annotator doesn't have to wait around for the next example, or allow them to undo their previous annotation if they made a mistake. Prodigy will try to always queue up enough examples in the background so the annotator never runs out or has to wait. So by default, it'll already ask your stream for the next batch while the current batch is sent back to the server and processed on the back-end.

Thanks for the hint. For my use case, it depends on whether I can parallelize the logic of the stream manipulation.

Just make sure you're not starting multiple threads in the stream, because this can easily lead to problems and doesn't play well with the generator.

If you're okay with using small batches instead of single questions, this would probably be the most straightforward approach. Let's say you have a task where you're asking questions about a text and the first one is "Is this about sports?". If the answer is yes, you want to be asking "Is this about football?". You start off by sending out 2 questions about sports, and while the user annotates the very first batch, Prodigy will ask for the next in the background. You'll first get 2 answers about sports back, and your update callback can queue up the 2 follow-up questions about football, while the annotator works on the second batch of sports. Next, they get the follow-up questions about football, while you queue up the follow-up for the second batch, and so on. If there are no follow-up questions available (either because they're not ready yet or because there are none left), you send out new sports questions. This way, you never run out. You could even sort the follow-up questions so you only ask about football once you have 10 questions together.

1 Like