I am trying to establish inter-annotator agreement and running text-classification annotations in multiple sessions (with feed_overlap = true and force_stream_order = true, with ?session=session_name).
There are two issues which I have been running into, which I hope you can help me out with:
Resuming annotation
The annotation process was killed at some point - I have my dataset and annotations stored savely in the database, however, am looking for a way how I could "resume" annotations from the point the annotators stopped at (their progress varied, so they have annotated a different amount of examples to date).
In Multi-user sessions and excluding annotations by session - Prodigy Support running multiple instances and implementing a custom filter has been proposed. Is there also an option to handle the varying progress in a single-instance/multi-session setting (same dataset, same labeling task and recipe)? If I re-run the prodigy recipe, the same example is displayed for all users, implying that either some users are re-served examples that they have already annotated or examples have been skipped for annotators which had not progressed far enough yet.
Examples being served multiple times to an annotator
I am using a custom recipe based on the textcat.teach one from prodigy-recipes/textcat_teach.py at master · explosion/prodigy-recipes · GitHub, that doesn't use --exclude and has prefer_uncertain removed (I want every annotator to label every example in sequence). Yet, in the multi-user session, I am getting repeating examples served to individual annotators.
Is there a particular reason for this to happen? My input is just a JSONL file (using the JSONL loader like in the linked textcat.teach recipe) and no repeating instances are produced by the stream.
Hi! Just to confirm, your goal is to have every annotator label the same examples, right? For a use case like this, we'd typically recommend using separate instances and per-annotator datasets. (Named multi user sessions were primarily designed for workflows where different users should see different examples.) If you use separate instances, you can start and resume the sessions for each annotator individually, and the stream should always start again where that annotator left off and the progress will be calculated per annotator.
This isn't directly related to the original topic but one quick question: Are you also updating the model on the loop? If so, I'm not sure this is a good workflow to use for inter-annotator agreement because the suggestions produced by the model will change, based on all the previous annotation decisions.
In general, it's potentially quite problematic if you're using a single instance: assuming that you have some disagreements between annotators, the same model will be updated with multiple, conflicting annotations. So you can very quickly end up with a potentially useless model, making less and less useful suggestions (and leading to less useful annotations).
On the first part: Yes, the plan is that all of us annotate the same examples. I will be trying to switch to the multi-instance mode - meanwhile I guess there is no way for me to control the examples that individual annotators receive in the multi-session mode, right?
On the second part: While I have replaced stream = prefer_uncertain(predict(stream)) from the textcat.teach recipe with stream = (example for _, example in model(stream)) to keep the stream order (update is model.update, similar to the textcat.teach recipe, disregarding the PatternMatcher), I agree that this was probably a bad idea.
Out of curiosity (to rule out that that's the reason for the repetitions in the examples individual annotators see): Is the score that the model adds to the individual example dictionaries also used in the calculation of an example's hash?
There is no plan to use the model, right now. My goal is indeed just getting a set of joint annotations over the same set of examples (and with me and others annotating in the same sequence, I can check early on if there are major disagreements that require early re-discussion of annotation standards).