batch_size setting in your
prodigy.json lets you control the batch sizes of examples that are sent to the server and back. This will also impact how often the app asks for a new batch of questions whenever the queue is running low. Keep in mind that there'll always be at least one batch "in transit" – the app keeps a batch of
history_size (whichever is lower) in the history before it's sent back to the server. so you can undo easily. Then once an annotated batch is complete, it's sent back to the server.
The built-in sorters like
prefer_uncertain operate on the whole stream before it's batched up and expect
(score, example) tuples. You can read more about them here: https://prodi.gy/docs/api-components#sorters
There's actually very little magic going on here – based on the score, it decides whether to yield the example or not. In the built-in functions, we also use an exponential moving average so we can process a potentially infinite generator, but avoid getting stuck in a suboptimal state, e.g. if the score threshold shifts slightly as the model is updated with more annotations.
That said, you should be able to very easily implement your own strategy in your custom recipe, right after loading and scoring your stream of raw examples. You can batch it up however you like, or even load all examples into memory and make multiple passes over them etc. At the end of it, the recipe should return a generator of dictionaries as it's
"stream" – how those examples are selected is up to you