Prodigy review recipe not entirely clear to me

Hello!

We have an annotated dataset that holds annotations from multiple annotators, and some of them are duplicates (with different annotated answers). We are trying to use prodigy review to loop through those duplicates and pick the one true annotation, using --auto-accept to store duplicates where the answer is the same automatically. This doesn't seem to work, as we are still seeing duplicates with the same answer. Is my understanding correct of how --auto-accept works?

The main issue however, is that from a dataset of 2000+ annotations, it creates a dataset of only 600 something annotations (prodigy review does, to be clear). I was under the impression that prodigy review would save the entire dataset, but with the reviewed annotations instead of the duplicates. Is that not correct? What are we supposed to do with these 600+ "reviewed" annotations, if not?

Thank you!

Hi Valentijn

Just to make everything more clear, I find it helpful to have a tangible example around just to make sure we're talking about the same thing.

So I bootstrapped a setup. I have this small examples.jsonl dataset:

{"text": "stroopwafels are great"}
{"text": "apples are healthy"}

Next, I annotate it with textcat.manual via:

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python -m prodigy textcat.manual issue-6044 examples.jsonl --label truthy

I pretend that there are three users who are annotating this by annotating via these three URLs:

http://localhost:8081/?session=vincent 
http://localhost:8081/?session=jimmy
http://localhost:8081/?session=lechuck # this person disagrees

After annotating some examples where vincent and jimmy always agree and lechuck disagrees, I can run the review interface.

prodigy review issue-6044-reviewed issue-6044

That looks like this:

In this case, two users agree and one disagrees. So it's not much of a surprise to see what we see. In this case, you'd also get the same example when you run it with --auto-accept because there is one annotator that consistently disagrees.

After annotating and running prodigy db-out issue-6044-reviewed I get a dataset with only two rows in it.

{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"review","answer":"accept","_timestamp":1666777717,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"versions":[{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777124,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"default":true},{"text":"stroopwafels are great","_input_hash":506862616,"_task_hash":-1495214589,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777142,"_annotator_id":"issue-6044-lechuck","_session_id":"issue-6044-lechuck","sessions":["issue-6044-lechuck"],"default":false}],"view_id":"classification"}
{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"review","answer":"accept","_timestamp":1666777718,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"versions":[{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777125,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-vincent"],"default":true},{"text":"apples are healthy","_input_hash":111541500,"_task_hash":1515955516,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777143,"_annotator_id":"issue-6044-lechuck","_session_id":"issue-6044-lechuck","sessions":["issue-6044-lechuck"],"default":false}],"view_id":"classification"}

But that makes sense because the 6 original annotations only concern two real examples. You could say the annotations got "merged". You can still see what each annotator did in the json blob, but each blob contains the information of three people.

Everyone Agrees

I've reviewed both examples and I will now annotate one example where everybody disagrees. I'll add this example to the file:

{"text": "brussel sprouts are amazing"} 

All three users will hit x for this one.

When I now run without --auto-accept it looks like this:

When I run run review with --auto-accept it looks like this:

This seems to be the behavior that I'd expect. One caveat though, as a side effect the data does seem to be saved automatically with the --auto-accept setting turned on. This can be confirmed via db-out as this row now appears even though I did not annotate it.

{"text":"brussel sprouts are amazing","_input_hash":564254940,"_task_hash":-321962903,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666777527,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-lechuck","issue-6044-vincent"],"versions":[{"text":"brussel sprouts are amazing","_input_hash":564254940,"_task_hash":-321962903,"label":"truthy","_view_id":"classification","answer":"reject","_timestamp":1666777527,"_annotator_id":"issue-6044-vincent","_session_id":"issue-6044-vincent","sessions":["issue-6044-jimmy","issue-6044-lechuck","issue-6044-vincent"],"default":true}],"view_id":"classification"}

All of this seems to work as expected on my end, so I'm curious where in this process something unusual is happening on your end. Could you perhaps elaborate a bit if something is still unclear?

Hi Vincent,
Thanks so much for your detailed reply! What I think is happening on our end, is that our different annotation sets have different lengths. We have been annotating a dataset of about 2000 rows - but not every annotator has labeled the entire dataset. We have also randomized the dataset - so it's possible annotator "Guybrush" annotated 600 rows, and annotator "LeChuck" annotated, say, 400 totally different rows. But, there is still some overlap - which is why I wanted to review these. I was thinking that (perhaps using --auto-accept) review would also include rows with only 1 annotation into the output. Is this not the case?

I think we at some point, used db-merge to merge those multiple annotation sets into one, by the way - so our 600 annotations from "Guybrush" and our 400 from "LeChuck" became one set of, say, 800 annotations (where some overlapping ones got merged). I think here we still kept some duplicates in the merged set, because they had different answers, but it also looks like there are duplicates in there that are just the same answer - which is probably our own fault. Our annotation team is new at this and we might have accidentally combined multiple sets, thinking it would merge into one.

In any case, it could be that our understanding of what review does isn't quite right, in that it expects the input to be of the same length, and it won't just merge in annotations that have no conflict/duplicate.

Thanks again for your help!

So let's extend the example.

I'll be annotating this on-theme example:

{"text": "a wood chuck could chuck a lot of wood if a wood chuck could chuck wood"} 

Again, I'll run:

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python -m prodigy textcat.manual issue-6044 examples.jsonl --label truthy

And now I'll annotate this with the guybrush user. This user did not appear before. And for good measure, I'll show the annotation from db-out:

> python -m prodigy db-out issue-6044 | grep guybrush
{"text":"a wood chuck could chuck a lot of wood if a wood chuck could chuck wood","_input_hash":-1690856185,"_task_hash":1885086500,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666878830,"_annotator_id":"issue-6044-guybrush","_session_id":"issue-6044-guybrush"}

Let's now see what happens when we review this item.

Without auto accept

prodigy review issue-6044-reviewed issue-6044

I don't make an annotation, but the interface does just show a single annotator just fine. Note that db-out, as expected, doesn't have anything from Guybrush.

python -m prodigy db-out issue-6044-reviewed | grep guybrush
# EMPTY! 

With auto accept.

prodigy review issue-6044-reviewed issue-6044 --auto-accept

It doens't show the annotation now!

But! Does it appear in the reviewed dataset automatically, like before?

python -m prodigy db-out issue-6044-reviewed | grep guybrush
# STILL EMPTY! 

The example with "wood chucks" doesn't appear in db-out because it's never been annotated by more than one person.

Back to Your Issue

It could be that there are hard duplicates in your data because the data got merged in a wrong way earlier. If that's the case, you might be able to alleviate the pain if you try out the --rehash flag in the db-merge recipe and re-run.

Another thing you can consider is to just do some analysis in a Jupyter notebook. If you're savvy with Pandas, you should be able to load in the jsonl file via;

import pandas as pd 

pd.read_json("path.jsonl", lines=True)

Alternatively, you might enjoy my clumper util library. It's a lot slower than pandas, but it's typically more expressive for nested lists of dictionaries.

Hi Vincent,

Thanks again for your detailed reply! OK, so review doesn't add in annotations by only one user (where there are no conflicts) to the outputted issue-6044-reviewed dataset, correct? So then, how would you ultimately combine these datasets into one for training?

Let me try to give an example:

Say you have a dataset of...

{"text": "a wood chuck could chuck a lot of wood if a wood chuck could chuck wood", "answer":"accept", "accept": ["true"]} 
{"text": "a wood chuck could chuck a lot of wood if a wood chuck could chuck wood", "answer":"accept", "accept": ["false"]}
{"text": "That's the second biggest monkey head I've ever seen!", "answer":"accept", "accept": ["true"]} 

When reviewing, you'd end up with basically just the one

{"text": "a wood chuck could chuck a lot of wood if a wood chuck could chuck wood", "answer":"accept", "accept": ["true"]} 

or whatever you reviewed as the best answer. The annotation of

{"text": "That's the second biggest monkey head I've ever seen!", "answer":"accept", "accept": ["true"]} 

would not be copied to the review dataset? How would you then get a dataset that's the reviewed annotation + the singular "monkey head" one?

I think I can answer part of my own question here by thinking about what you said about our data being merged wrongly. I'm guessing that, before you review, your database holds only one row per annotation (so not a duplicate one like in my example for the woodchuck quote) - is that right? If so, I wonder how we ended up with all these duplicates... I suspect we haven't used the ?session param, maybe that's it? I think one of our annotators also used prodigy's correct recipe, which I'm also not quite sure about how to combine that with our earlier annotations (but that's perhaps a separate question).

I'm still not quite sure though how it should work if we review a set and want to combine that with singular annotations from the input set, that weren't copied during review (if that makes sense).

Thanks again for all your help, I really appreciate it!

Correct, assuming the --auto-accept flag is given, but as we saw earlier, if there is no such flag it will simply pop up in the interface with a single annotator attached.

I'm still not quite sure though how it should work if we review a set and want to combine that with singular annotations from the input set, that weren't copied during review (if that makes sense).

You can do two passes over the data, no? One with --auto-accept and one without?

Ah, so, ok, I didn't realize that without using --auto-accept it WILL show single annotations. So... Correct me if I'm wrong, but wouldn't it make more sense if --auto-accept won't show the annotations in the Prodigy web app, but will automatically add them to the new dataset? I'm not sure I get why you would essentially lose that data - shouldn't auto-accept mean we automatically accept those answers and keep them?

I'll try running both options though, thanks for the suggestion! I'm worried that will lead to duplicates however, or will Prodigy know to check the output issue-6044-reviewed when running the second review to see what annotations are already in that set? That would be nice :slight_smile:

Thanks again for all your replies, and hard work on Prodigy!

1 Like