ines
(Ines Montani)
October 8, 2019, 10:42pm
2
Hi! I've explained some of this in more detail in this thread :
Yes, this is currently expected, because on each load, the app makes a request to the server and asks for the next batch (by default, the batch size is 10). The annotated tasks are sent back to the server periodically, so when a new batch is requested, Prodigy can’t yet know whether a question that was previously sent out was already annotated or not. (Annotating all sentences / examples is also a pretty specific goal that only applies to some use cases and data streams.)
If it’s important to you that all sentences are annotated, and you do want to handle cases where the annotator refreshes their browser, you ideally want to reconcile the questions/answers at the end of a session, and compare the _task_hash
to find examples in your data that you don’t have an answer for in the dataset. You can either do this in a custom recipe within the stream generator, or as a separate session that you run after the previous one finished.
My post here has a little example of an "infinite stream" that checks the incoming examples against the hashes in the database to make sure everything is annotated:
Hi! The it and tID indices make your code a little difficult to follow – but it looks like you’ve already solved the image choice part? Each example should have one or more "options" and each option should have an ID and a text. That should be all you need to make it render as an image with multiple choice options.
The other thing you’re trying to do is loop over the examples over and over again until every example is in the database. Maybe it helps to break this down into steps. Fundamentally,…
Of course, you could also come up with your own custom logic for this. Streams in Prodigy are regular Python generators that yield example dicts, so they can respond to external state and let you control what to send out when.
1 Like