First result gets ignored in terms of progress?

Hi all! I am struggling to get my progress bar to work right. I have a stream that loads tasks from a database.

  • The tasks are stored in a list attribute self.examples inside the stream.
  • The stream has a __len__ method which returns the length of the list.
  • The stream implements __iter__ by returning the iterator of the list, i.e.iter(self.examples).
  • I deduplicate the tasks in the list by their task hash when I fetch them from the database.
  • I compute progress by dividing controller.total_annotated by the length of the stream.

In my test case, there are a total of 9 tasks, meaning that each should lead to a progress of about 11%. All of the tasks have both a unique _task_hash and a unique _input_hash.
The first time I accept, reject, or ignore a task, the progress bar is not updated and stays at 0%.
Once I have processed all 9 tasks, the final progress comes out to 89%, even though 9 answers were given and recorded in the database.

Any ideas what might be going wrong?

Hi! How are you sending the answers back? Are you setting "instant_submit": True to send after each annotation, or are you just using a batch size of 1?

The progress is returned as a response to /give_answers, so every time a batch is sent back, the progress is updated (based on the stream, a custom function etc.). So you'll see the first result once the first batch is sent back. If you're only annotating with a regular batch size of 1, keep in mind that one example will always stay on the client before it's sent back. So after you submit example 1, it's in the history and you answer example 2, then example 1 is "outboxed" and sent out next, and so on.

This should also mean that hitting "save" manually should make the progresss go up to 100% in your case because that will cause all unsent answers to be sent back. If that's not the case, maybe you can add an update callback and inspect the hashes that are coming back (and see if there are any that aren't)? It's unlikely that it's related to existing annotations in the dataset, because in that case, you'd still be seeing the same scores, since total_annotated refers to all annotations in the dataset. Or, if something went wrong here, you'd be seeing progress of > 100% if your existing annotations contain examples that aren't in the stream.

Hi Ines, thanks for taking the time!

This is my prodigy.json:
{
"port": 8080,
"instant_submit": true,
"batch_size": 9999,
"history_size": 1,
"auto_exclude_current": false,
"exclude_by": "task"
}

I have made a log which contains the hashes on my last run (it's 4 examples instead of 9 now, but the same thing is happening -- progress stops at 75%) and the controller.total_annotated and controller.session_annotated variables each time the progress method is called.

Here is the log: Prodigy test run with progress - Pastebin.com

All the hashes are coming back. The progress method is called twice with 0 total_annotated and session_annotated. It looks to me as if the controller's counting is off by one -- is it using an enumeration to keep track of the number of samples that have been processed already?

Thanks for the update!

I spent some time looking into it and I was initially convinced that it must be related to the instant_submit, which was confusing. But turns out that Prodigy actually does increment the total and session annotated counts after calling the recipe's update method and passing its return value forward to a custom progress function. This was likely introduced in an update a while ago that unified the progress reporting and it only surfaced with a custom progress function and a custom update callback.

I just fixed it for the next release, sorry for the confusion this caused! (You are using an update callback as well, right?)

1 Like

Yay! Thanks a bunch, Ines :slight_smile:

Yes, I am using an update callback. For now, I've worked around this by tracking the numbers myself.

Just released v1.10.8, which should solve the underlying issue! :slightly_smiling_face:

1 Like