Prodigy review recipe not entirely clear to me

koaning · October 27, 2022, 1:51pm

So let's extend the example.

I'll be annotating this on-theme example:

{"text": "a wood chuck could chuck a lot of wood if a wood chuck could chuck wood"}

Again, I'll run:

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python -m prodigy textcat.manual issue-6044 examples.jsonl --label truthy

And now I'll annotate this with the guybrush user. This user did not appear before. And for good measure, I'll show the annotation from db-out:

> python -m prodigy db-out issue-6044 | grep guybrush
{"text":"a wood chuck could chuck a lot of wood if a wood chuck could chuck wood","_input_hash":-1690856185,"_task_hash":1885086500,"label":"truthy","_view_id":"classification","answer":"accept","_timestamp":1666878830,"_annotator_id":"issue-6044-guybrush","_session_id":"issue-6044-guybrush"}

Let's now see what happens when we review this item.

Without auto accept

prodigy review issue-6044-reviewed issue-6044

I don't make an annotation, but the interface does just show a single annotator just fine. Note that db-out, as expected, doesn't have anything from Guybrush.

python -m prodigy db-out issue-6044-reviewed | grep guybrush
# EMPTY!

With auto accept.

prodigy review issue-6044-reviewed issue-6044 --auto-accept

It doens't show the annotation now!

But! Does it appear in the reviewed dataset automatically, like before?

python -m prodigy db-out issue-6044-reviewed | grep guybrush
# STILL EMPTY!

The example with "wood chucks" doesn't appear in db-out because it's never been annotated by more than one person.

Back to Your Issue

It could be that there are hard duplicates in your data because the data got merged in a wrong way earlier. If that's the case, you might be able to alleviate the pain if you try out the --rehash flag in the db-merge recipe and re-run.

Another thing you can consider is to just do some analysis in a Jupyter notebook. If you're savvy with Pandas, you should be able to load in the jsonl file via;

import pandas as pd 

pd.read_json("path.jsonl", lines=True)

Alternatively, you might enjoy my clumper util library. It's a lot slower than pandas, but it's typically more expressive for nested lists of dictionaries.

Topic		Replies	Views
prodigy review --auto-accept exhausting stream before all annotations saved to gold dataset review , streams	10	919	January 27, 2023
NER review datasets with partial overlap while keeping all texts usage , ner , best-practices , review	7	589	February 20, 2023
Review recipe - how to review? review	9	152	May 30, 2024
Review recipe: auto accept identical annotations enhancement , usage , ner , done , solved , review	6	787	August 12, 2021
Inconsistency Number of Annotated Data ner , textcat	10	34	November 27, 2024

Prodigy review recipe not entirely clear to me

Without auto accept

With auto accept.

Back to Your Issue

Related topics