Hi Valentijn
Just to make everything more clear, I find it helpful to have a tangible example around just to make sure we're talking about the same thing.
So I bootstrapped a setup. I have this small examples.jsonl
dataset:
{"text": "stroopwafels are great"}
{"text": "apples are healthy"}
Next, I annotate it with textcat.manual
via:
PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python -m prodigy textcat.manual issue-6044 examples.jsonl --label truthy
I pretend that there are three users who are annotating this by annotating via these three URLs:
http://localhost:8081/?session=vincent
http://localhost:8081/?session=jimmy
http://localhost:8081/?session=lechuck # this person disagrees
After annotating some examples where vincent and jimmy always agree and lechuck disagrees, I can run the review interface.
prodigy review issue-6044-reviewed issue-6044
That looks like this:
In this case, two users agree and one disagrees. So it's not much of a surprise to see what we see. In this case, you'd also get the same example when you run it with --auto-accept
because there is one annotator that consistently disagrees.
After annotating and running prodigy db-out issue-6044-reviewed
I get a dataset with only two rows in it.
{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"review","answer":"accept","_timestamp":1666777717,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"versions":[{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777124,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"default":true},{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777142,"_annotator_id":"issue-6044-lechuck","_session_id":"issue-6044-lechuck","sessions":["issue-6044-lechuck"],"default":false}],"view_id":"classification"}
{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"review","answer":"accept","_timestamp":1666777718,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"versions":[{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777125,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"default":true},{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777143,"_annotator_id":"issue-6044-lechuck","_session_id":"issue-6044-lechuck","sessions":["issue-6044-lechuck"],"default":false}],"view_id":"classification"}
But that makes sense because the 6 original annotations only concern two real examples. You could say the annotations got "merged". You can still see what each annotator did in the json blob, but each blob contains the information of three people.
Everyone Agrees
I've reviewed both examples and I will now annotate one example where everybody disagrees. I'll add this example to the file:
{"text": "brussel sprouts are amazing"}
All three users will hit x
for this one.
When I now run without --auto-accept
it looks like this:
When I run run review with --auto-accept
it looks like this:
This seems to be the behavior that I'd expect. One caveat though, as a side effect the data does seem to be saved automatically with the --auto-accept
setting turned on. This can be confirmed via db-out
as this row now appears even though I did not annotate it.
{"text":"brussel sprouts are amazing","_input_hash":564254940,"_task_hash":-321962903,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777527,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-lechuck","issue-6044-vincent"],"versions":[{"text":"brussel sprouts are amazing","_input_hash":564254940,"_task_hash":-321962903,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777527,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-lechuck","issue-6044-vincent"],"default":true}],"view_id":"classification"}
All of this seems to work as expected on my end, so I'm curious where in this process something unusual is happening on your end. Could you perhaps elaborate a bit if something is still unclear?