Hi,
We currently working on a NER project. We have a larger number of samples for annotation. We have multiple people in the team(around 10). We have started evaluating the active learning capability of the prodigy and it is excellent. We like to purchase the lincesnce. Please help us to get answers to the following questions to make our purchase process smoother.
In case of multiple user annotation, all the users will get similar sentences/paragraphs for annotation or different? We essentially want all the users should get a different set of sentences for annotation and that should not overlap with each other.
Consider prodigy satisfy the first condition.
i) how active learning works? All annotators would have a local model or they will be updating to a global model?
ii) Does annotation from one user changes the recommendation for another user?
iii) Does prodigy store user_id for specific annotation is DB?
Does ner.teach and ner.correct work in conjunction? Underlying active learning would be the same in both the approach?
I found out multiple threads regarding multi-user capability, but they are 1 or 2 years old hence specifically asked for these questions.
Hi! If you're annotating with multi-user session, this will be the default behaviour, yes. If you want everyone to annotate the same data, it's usually easiest to just start separate instances for all annotators, so they can work in isolated processes.
Alternatively, you can also split your data up beforehand and start multiple instances for your different annotators. This can be a good solution if you already know how you want to divide up the work.
This is a good question and definitely something you should keep in mind when annotating with a model in the loop. If you have multiple users connecting to the same instance with a single model in the loop, then yes, all annotations they create will be used to update the model, and the updated model will have an impact on future suggestions. That's also why it's not always a good idea to use a model in the loop like this with a lot of users, especially if they might create conflicting annotations. In the worst case scenario, you could have one annotator moving the model in one direction, and someone else moving it into the opposite direction, so you'd be getting a lot less value out of the model's suggestions.
If you don't want this, a better solution could be to give every user their own instance with their own model in the loop, or start with a manual annotation session first and assess the quality and conflicts (e.g. using the review workflow).
Yes, if you're using multi-user sessions, Prodigy will store the session ID in the "_session_id" key of the annotation.
No, ner.correct currently doesn't update the model in the loop and doesn't perform any example selection. It will just pre-highlight the model's predictions in the text and lets you manually correct them.