Infinite stream not working as expected

This relates to my post on "depth-first" workflows ("Depth-first" workflows?)

Hi all! I’m playing around with batch size 1 to see whether I can create a variable workflow in which the next task depends on the current task’s result.

I’m running into some strange behavior.

Here's a paste of my test setup, infinite_stream.py: infinite_stream.py - Pastebin.com
I'm trying to simply set up an infinite stream of trivial text tasks and serve those up.

I’m running it via prodigy test_infinite_stream -F infinite_stream.py

I’m using only the port, instant_submit, batch_size, and history_size options in my Prodigy config. As an example, in one scenario I’m using the following prodigy.json:
{
"port": 50001,
"instant_submit": true,
"batch_size": 1,
"history_size": 0,
}

Scenario 1: batch size 1, history size 0, instant_submit true

Once I accept (or reject) the first example, Prodigy tells me there are no tasks available. I've attached a screenshot of the HTTP requests I've observed in this scenario. Note how it does not try to fetch another task.

Scenario 2: batch size 1, history size 0, instant_submit false

I’ve done two tests with these settings:
Scenario 2a: I hit save after each task.
Scenario 2b: I don’t hit save after each task.

In scenario 2a, I seem to get an infinite stream of tasks. This is the network trace after hitting accept (and saving) a couple of times:


In scenario 2b, I just keep accepting tasks without saving in between them. It allows me to annotate three tasks, then tells me there are no more tasks left. This is the network trace:

Note that in this case, give_answers is called but the interface still indicates that I need to save.
If add an update method to the custom recipe which simply prints a) that it was called and b) each received answer, I can see that the automatic call to give_answers saves the task with the ID 0. If I save manually, the tasks with ID 1 and 2 are saved. Here's a paste of the output: output after saving manually - Pastebin.com (Note that in this paste, I had changed the port but kept everything else the same)

I couldn't find anything on this. Is this a known bug, or am I doing something wrong? Did I misunderstand task generation?

1 Like

I am having a similar issue. I am also updating the streams dynamically where I add new tasks to the stream which depend on the labels selected for the current task. In my case hitting refresh on the UI loads the next batch.

I have to look at this in more detail but I think this might come down to a combination of a batch size of 1 and when exactly the app is asking for more questions. The fact that saving manually after each task produces he desired behaviour is very interesting because this is essentially what the instant_submit mode is supposed to mimick. So there must be some subtle difference here :thinking:

While going over the logic again, I also came up with another theory I want to investigate: there could be a potential race condition related to the reporting of the loading state while the instant submit mode is constantly switching between sending and receiving. If that's the case and possible, it could mean that the request for more questions exits early because it thinks it's already running – and when sending answers finishes, Prodigy concludes there's nothing pending, and there's no queue.

It sounds like part of the confusing behaviour here might happen because the history_size is set to 0. This is mostly a UI setting and you shouldn't typically have to set this. I hadn't really thought about what would happen if it's set to 0, but it's basically this: you annotate task 1 and it's stored on the client (but not visible because the history size is 0). You then annotate task 2, and task 1 is "outboxed" to be sent back to the server. Because the "outbox" is full (it contains a full batch, 1 example), task 1 is sent. You annotate task 2 and it's stored and outboxed when task 3 is shown. If no other tasks are following, this is the final state and no more server requests are made – you'd normally see task 3 in the history now, but the history size prevents that and it looks like there's nothing left.

I have also tested this with history_size 1. The behavior was the same IIRC, though I did not check the contents of the automatic vs manual saves. If it helps, I can re-create those tests and send you more screenshots.

Is there a solution to this? I'm having the same problem with batch_size=1