Training with multiple annotators

If multiple people access the same default session, they’ll all get different examples – the next batch in the stream. That’s because Prodigy doesn’t know who they are and treats them all as “the same person”. So whenever a request for new questions comes in, it’ll send the next batch that’s available.

Some things to consider here:

  • Whenever someone accesses the app (or reloads the page), they’ll get a new batch. Prodigy can’t know that a batch it sent out for annotation isn’t “coming back”. Maybe someone is working on it and taking a long time, maybe they internet connection died, and so on. This is typically difficult to work around. So you might want to implement an “infinite stream” that periodically checks the database and sends examples out again if they’re not in the dataset yet. This also gives you much more fine-grained control over what’s sent out when. I’ve explained an approach for this step-by-step in my comment here.
  • If you’re planning on using active learning-powered recipes like ner.teach that update a model in the loop, the process may not be as effective if multiple people are annotating and updating the model. In the best case scenario, they’ll all make similar decisions and move the model in the same direction. In the worst case, they try to move the model in different directions and as a result, make it suggest worse annotations.

(Btw, quick heads-up if you’re working with the feed_overlap setting: There’s currently a known issue that tends to occur with short streams and causes subsequent sessions to not see examples if the previous session already completed the stream. If you’re hitting that, see here for details and a workaround. We’ll be fixing that in the next version.)