Named multi-user session exceeds dataset length


I use prodigy v1.11.5 with named multi-user sessions. My dataset has 61500 sentences. Prodigy shows the total as 62500 and it keeps sending new sentences. I didn't set feed_overlap = True and the doc says the default is False.

Hi! A few things to check to get a better idea of what could be happening:

  • Are you using a workflow or recipe that generates new examples, e.g. by splitting sentences, like in ner.correct? If you have 60k examples with 2 sentences each on average, you'll end up with ~120k examples in total. So the number of annotations doesn't necessarily map to the number of incoming examples.
  • Can you try upgrading to the newest version? We also have an alpha version of the upcoming release that includes some fixes around multi-user workflows that you could try: Duplicate annotations in output - #12 by kab
  • There can be effects where Prodigy re-sends a batch if an annotator requested a batch and never submitted it, and it expired. This "work stealing" is important so you don't lose any data if an annotator stops working and doesn't save in a given time period. That said, 1k examples difference would be a lot here and this would normally only affect a batch or two.