Ah, sorry if the terminology I used was confusing! What I meant was the session/instance/process. There are generally two ways you could set this up and it depends on your project, annotators and how you want to set up the workflow:
- Start one Prodigy process one a given host/port per annotator and save the annotations to separate datasets. For example, on port
8080
, you start the Prodigy server with 100 documents and save toannotations_andrew
, on port8081
you start Prodigy with the next 100 documents and save toannotations_ines
, and so on.- Pros: very straightforward, easy to keep a separation between the work, no problem to update a model in the loop because every annotator has their own model instance and there are no conflicts
- Cons: you have to run multiple processes (could be automated with a script but there is more going on), it's harder to share state between the sessions because process 1 doesn't know what process 2 has queued up
- Start Prodigy once and have multiple annotators access the same process using named sessions. For example, you would be accessing the app with
?session=andrew
.- Pros: you only have to start one process, it's easier to send out examples to a different session based on shared state
- Cons: can be harder to reason about because there's more state and more things influencing each other, harder to share a model in the loop because you have multiple people potentially updating it in different directions