Prodigy showing repeated sentences to annotator when feed_overlap: true

Hi,

I have a database that annotators have used with two different configurations: first with a default prodigy.json and later with the option feed_overlap: true. I have ran these instances with PRODIGY_ALLOWED_SESSIONS allowing sessions 3 annotators and one for me, but I never annotate with my session as I only use it to see how the interface renders.

At the beginning, I wanted annotators to cover all of the sentences in the JSONL file, which are 537 in total. This was the instance running with the default prodigy.json, so each annotator annotated different sentences and annotations would add up to ~537.

After that, I wanted to have annotations from all annotators for all the sentences. To do this, I restarted the Prodigy server with the option feed_overlap: true in prodigy.json. I assume this would present to each annotator the sentences they hadn't annotated yet from the 537 set.

Looking at the progress I see that each annotator has annotated the following number of sentences: 544, 430, and 411. One annotator has annotated more than the total of sentences, and Prodigy is showing more for her to annotate.

Why is this happening? The session names and everything else is the same, so why is Prodigy showing more sentences to an annotator that has already annotated all of them?

Thanks

Update: I stopped and restarted the Prodigy server and noticed that the annotator session with 544 annotations didn't show more sentences (it showed the "No tasks" interface). But after logging into the other annotator sessions and coming back to that one, it showed more sentences to annotate. Adding allow_work_stealing: false seems to solve the issue.

Hi @ale,

Apologies for the delayed reply! First of all, your assumptions are correct. It's true, that be default work_stealing is enabled, but it should only be possible to steal tasks that have not been answered yet or have not been answered enough times as per overlap setting or have not been queued to another session.
The reason why you observed 544 total annotations for one of the sessions is that the check whether a given task is legit to steal is done against the tasks answered in the current session not against the database.
That's why I suspect that some of the repeated examples were saved in the previous server session (the one with feed_overlap set to false).
We'll look into this check against previously answered questions - it is most likely a bug.

Finally, loading into another annotator session makes tasks available for stealing (you always only steal from another session) so that would explain why there are tasks available again in a session that seemingly got to the end of the queue. In fact, to make this a bit more intuitive, in the recent version of Prodigy we've updated the "No tasks availalable" screen to prompt users to refresh in case there are new tasks:

1 Like