Training with multiple annotators

I’ve recently started using Prodigy and am trying to grasp the workflow around having multiple annotators. I understand that you can have multiple sessions using ‘?session=NAME’.

After I get all my annotations, which of them will ner.batch-train use. It is possible that two different annotators labeled the same example differently…which of them would the model use?

Also, if I don’t specify the “?session=NAME” piece, will it treat multiple annotators as one session?

It looks like it does - but both “default” sessions still seem to see the same example

Hi! The ?session marker lets you explicitly name the user sessions if you want to do everything within one Python process. However, you can also just start multiple processes on different ports and have your annotations add to separate datasets. This is often cleaner and makes it easier to compare the annotations later on.

Yes, if you don’t name the session, all annotations will be added to one default session.

This is something that Prodigy can’t decide for you – that’s something you have to decide :slightly_smiling_face: If you trained a model with conflicting annotations, it typically ignores them, because there’s no valid gold-standard annotation that the model can learn from.

If you need to reconcile annotations from different annotators that may be conflicting, check out the new review recipe. It lets you load in one or more datasets and will group all annotations on the same input text together. You can then see who annotated what and where the conflicts are – and create one correct “master annotation”. See here for a little video that shows the process in action:

Thanks! That is a useful recipe.

I am still confused by whether different annotators see the same question (all assuming I have one provider process). I am thinking of the following scenarios:

  1. Multiple people open the default session. Will see all the same tasks UNLESS ‘feed_overlap’ is set to false.

  2. Two named sessions are created: Will see all the same tasks UNLESS ‘feed_overlap’ is set to false.

  3. Are these behaviors consistent across tech and mark/manual.

My practical goal right now is to have a set-up where I can have unlimited annotators share the work (I am asking people on my team to help out when they have time). I am guessing it makes sense to direct everyone to the default session - but I want to make sure that people are not doing duplicate work.

(I would like to understand the other scenarios for future reference).

Thanks

If multiple people access the same default session, they’ll all get different examples – the next batch in the stream. That’s because Prodigy doesn’t know who they are and treats them all as “the same person”. So whenever a request for new questions comes in, it’ll send the next batch that’s available.

Some things to consider here:

  • Whenever someone accesses the app (or reloads the page), they’ll get a new batch. Prodigy can’t know that a batch it sent out for annotation isn’t “coming back”. Maybe someone is working on it and taking a long time, maybe they internet connection died, and so on. This is typically difficult to work around. So you might want to implement an “infinite stream” that periodically checks the database and sends examples out again if they’re not in the dataset yet. This also gives you much more fine-grained control over what’s sent out when. I’ve explained an approach for this step-by-step in my comment here.
  • If you’re planning on using active learning-powered recipes like ner.teach that update a model in the loop, the process may not be as effective if multiple people are annotating and updating the model. In the best case scenario, they’ll all make similar decisions and move the model in the same direction. In the worst case, they try to move the model in different directions and as a result, make it suggest worse annotations.

(Btw, quick heads-up if you’re working with the feed_overlap setting: There’s currently a known issue that tends to occur with short streams and causes subsequent sessions to not see examples if the previous session already completed the stream. If you’re hitting that, see here for details and a workaround. We’ll be fixing that in the next version.)

Thanks for answering my questions!
This should be enough to get me started :slight_smile:

1 Like