How to verify Multi-annotated gold-standard labels

First thank you for creating this awesome annotation tool.

I would like to annotate an NER dataset and use multiple sessions for different annotators using ner.correct recipe.

Suppose I have 3 sessions that operate on same dataset. Each annotator saves their tasks along the way.

What recipe can I use to view the accepted tasks of each annotator session?

Can each annotator export their accepted tasks per session into an output jsonl file?


Hi and thanks! :blush: There are different ways you could do this and it depends a bit on how you're dividing up the data. If all annotators are labelling the same examples, you can use Prodigy's review recipe to compare all annotations on the same data and resolve potential conflicts:

If you're using named multi-user sessions, the examples will be saved to the main dataset, as well as the session dataset named {dataset}-{session}. So if you're annotating with ?session=rindra and saving to the dataset my_cool_dataset, the session dataset will be my_cool_dataset-rindra and you can export it using db-out or load it via the Python API. The individual examples in the dataset will also expose a _session_id.

Alternatively, if you know how you want to split up your data, you could also have all your annotators save their annotations to separate datasets. This is a lot cleaner so I we often recommend this during development: all sessions are separate, so if there's a problem (e.g. one annotator misunderstanding the scheme), you can just delete the dataset and start over. Merging datasets later is always easy – recipes like review, train and data-to-spacy all support multiple datasets as input :slightly_smiling_face:

Thanks @ines for the clear explanations. I appreciate your help.

The two approaches you suggested would work in my case. I did not know that a session dataset is also accessible for export.

So if I export a dataset under my session with db-out, will it only export the accepted examples under manual recipe? Or will it also export the ignored or rejected examples?

If I wanna use the review recipe later, I want to focus on comparing accepted examples that are in both datasets.

Thanks for your support.