I want to re-order the future samples when clicking "yes" (green) button every time. I found “prodigy.components.sorters” and used it according to "ner_teach.py". I found out that it can only change the order at the beginning of labeling. I was wondering if I can use “prodigy.components.sorters” dynamically during the labeling, namely when clicking "yes" (green) button every time.
Hi! Since the stream is a generator and only consumed in batches, it can definitely respond to outside state, e.g. a changing global variable updated via the update callback (called whenever annotations are received), or an updated model in the loop. This is also how the active learning works.
If you can load all your examples into memory, you could keep it in as a nonlocal variable, make your stream pop the first N examples from the list of examples and send it out. In your update callback, you could then check what the answer was and use that information to reorder the remaining examples. Here's a rough sketch of how this would work – the specifics obviously depend on your use case and how you want the reordering to work:
all_examples = load_your_examples()
def get_stream():
nonlocal all_examples
while all_examples: # keep doing this until there are no more examples
batch = all_examples[:5]
all_examples = all_examples[5:]
for eg in batch:
yield eg
def update(answers):
# This is called whenever new answers are received
nonlocal all_examples
all_examples = reorder_your_examples_based_on_answers(all_examples)
One thing to keep in mind is that Prodigy will try and always keep the queue of questions filled, so it will keep asking for new questions in the background if the queue runs low. So even if you're using a batch_size of 1, there may always be at least one example "in transit" that's sent back to the server, while Prodigy asks for more examples in the background. So the reordering will only be reflected in the next batch.
So ideally, you want to choose a workflow where you can go through at least a couple of examples at a time, send them back, annotate the next batch and do the reordering in the background while you annotate the previous batch. This also gives you more time to do the re-ordering on the back-end. Depending on what you're doing here, this may take a while, even if it's just one second.
One more question: is it possible to let prodigy to clicking the green "yes" button itself automatically under some conditions such as the current example containing some specific patterns?
In that case, you could just skip the UI alltogether and add the example to the database automatically? Prodigy lets you interact with the database programmatically (see here for details), so you could do something like this:
from prodigy.components.db import connect
# in your recipe
db = connect()
if dataset not in db:
db.add_dataset(dataset)
In your stream, you can then add the "answer" to the example if the given condition (pattern match or something else) applies and add it to the dataset:
# in your stream
for eg in batch:
if some_condition_applies(eg): # check if example should be accepted
eg["answer"] = "accept"
db.add_examples([eg], [dataset])