Incomplete annotations with textcat.manual

@ines, I am having an issue where my annotators are reporting that Prodigy has no tasks available (see below).

I am using a modified version of the textcat.manual recipe, where I changed a single line to enable the progress bar:

I ran a separate session for each annotator using the following bash script:
for PORT_NUM in {8000..8006}
do
PRODIGY_PORT=$PORT_NUM prodigy textcat.manual annotator_$PORT_NUM ./data/discharge_orders_sample.jsonl --label DISCHARGE &
done

All four annotators are seeing the "No tasks available." message at different completion marks (2626, 2592, 2571, and 2669 out of 3000 total examples). Is this a known issue? Could I be doing something wrong on my end?

Hi! The "No tasks available" message is shown if there are no examples in the stream that haven't been annotated in the dataset or are otherwise duplicates or excluded.

If you expose a stream with a length, the progress is calculated based on the original length of that list and the number of existing annotations. It doesn't take any filtering into account, so if your stream has duplicate data for instance, you may be done before the progress hits 100%. If batches were skipped, you may also get to the end of the stream before all examples have been annnotated – in that case, you can just re-start the server and you should see the remaining examples for the given dataset. Or you can set "force_stream_order": true to make sure examples are always re-sent and sent out in the exact same order (only recommended in manual interfaces and if you don't have more than one annotator per session).

That's very interesting. I thought it might be due to duplicated data, but if that were the case, wouldn't the number of instances be the same for each annotator? All four of my annotators see the "No tasks available" message, but they have completed different numbers of annotations. You are saying that this could be a skipped batch issue? Could I reduce the likelihood of this happening in the future by somehow setting the batch size to 1?

It's less about the batch size and more about requesting multiple batches and not answering all of them. By default, Prodigy will make no assumptions about a batch it sends out and will just wait for it to come back – so if you open the app and request a batch, don't annotate, close it again and request a new batch, you'll get the next batch. So if you have two people annotating in the same session, they'll both get different data. But if a batch that was sent out isn't coming back annotated, it will be queued up again when you restart the server.

Setting "force_stream_order": true lets you make sure that the examples are always sent out in the same order and a batch is re-sent until it's fully annotated and only then the next batch is sent. This makes sense if you care about annotating all examples in the stream in the oder they come in – it's less useful if you have several people in the same session or if you're using an active learning recipe where you care about the example selection.

@ines, thank you for the clear explanation. I will use the "force_stream_order": true setting for similar tasks in the future.

1 Like