Problems with ner.manual on short, dynamic task queue with batch_size 1

I was asked to create an interface where a user could paste in unlabelled examples (for NER tagging) in a text box, submit, and immediately annotate the example.

I wanted to use Prodigy for this, so I thought of using an on-disk queue that the text-box callback appends to, and a custom Prodigy loader pops tasks from. This means that when the queue is empty, the loader has to either yield some dummy value or simply wait to yield a value until there is an example in the queue. I obviously have to set the batch_size to 1, otherwise Prodigy waits for multiple examples to be submitted before displaying any to be labelled.

I have stumbled across some issues here. At first, the annotation UI just shows "Loading..." when it's waiting for a task to be available. I then submit a task, and the loader immediately picks it up from the queue. After the task has been annotated, if there is another task in the queue Prodigy loads that, but if it's empty it's just supposed to stay on "Loading..." until a task comes in. But sometimes (and I haven't been able to figure out what the circumstances are, it seems quite random) it shows up with "No tasks available". As soon as there is a task available this message disappears and the next task shows up for annotation, but the problem is that previously annotated tasks seem to get lost when the "No tasks available" screen shows up.

Another issue I've seen is that if the "Loading..." screen is showing and I want to stop annotating and save the progress, pressing the Save button doesn't do anything until a new example comes in. Meaning data isn't saved at all unless I submit a new unlabelled example.

I know this is quite niche and not really the intended usage, but if you have any idea what's going on and how to fix it that would be great. It's such a good UI for annotation, so would love to be able to use it to work on a dynamic task queue.

I can try to post some minimal code examples if needed.

My guesses at the issue:

  • The UI is probably blocked by the "Loading..." state and therefore isn't able to save examples unless it goes to the "No tasks available" state or there is a task displayed.
  • Race condition or some weird thing like that causing the UI to go to the "No tasks available" state rather than stay on "Loading..." when it's waiting for the loader to yield the next task.

FYI, the only "fix" I've found is for the loader to return a unique dummy example when the queue is empty ({"text":"This is a dummy example, please ignore. XYZ123_random_sequence"}).

Thanks for the detailed report and analysis! You're using Prodigy 1.9+, right? And have you tried the "instant_submit": true setting? This will immediately send an answer back when you hit accept/reject/ignore, before requesting the next batch. It does mean you can't undo, but it should resolve some of the saving issues.

The "No tasks available" message is shown if the next batch returned by the stream is an empty list. Typically, this means that there's nothing left – but there are some edge cases where that's not necessarily true. In 1.9, we also switched over to FastAPI and async, which is great, but also makes some parts of the server behave slightly differently. This issue came up the other day and my current theory is that maybe the batching logic is executed in parallel, so while batch A is still loading, batch B finishes with an empty list, which makes Prodigy think that it's over. We're currently trying to come up with a solution for this.

I've had the "No tasks available" problem with both 1.8.5 and 1.9.x. I'm pretty sure my loader generator isn't exiting, it's a while True: loop with a yield at the end.

I haven't tried using "instant_submit": true, but this might be what I have to do.