Hi! Since the stream is a generator and only consumed in batches, it can definitely respond to outside state, e.g. a changing global variable updated via the
update callback (called whenever annotations are received), or an updated model in the loop. This is also how the active learning works.
If you can load all your examples into memory, you could keep it in as a nonlocal variable, make your stream pop the first N examples from the list of examples and send it out. In your
update callback, you could then check what the answer was and use that information to reorder the remaining examples. Here's a rough sketch of how this would work – the specifics obviously depend on your use case and how you want the reordering to work:
all_examples = load_your_examples()
while all_examples: # keep doing this until there are no more examples
batch = all_examples[:5]
all_examples = all_examples[5:]
for eg in batch:
# This is called whenever new answers are received
all_examples = reorder_your_examples_based_on_answers(all_examples)
One thing to keep in mind is that Prodigy will try and always keep the queue of questions filled, so it will keep asking for new questions in the background if the queue runs low. So even if you're using a
batch_size of 1, there may always be at least one example "in transit" that's sent back to the server, while Prodigy asks for more examples in the background. So the reordering will only be reflected in the next batch.
So ideally, you want to choose a workflow where you can go through at least a couple of examples at a time, send them back, annotate the next batch and do the reordering in the background while you annotate the previous batch. This also gives you more time to do the re-ordering on the back-end. Depending on what you're doing here, this may take a while, even if it's just one second.