How to verify Multi-annotated gold-standard labels

rindranirina · May 4, 2021, 9:58pm

First thank you for creating this awesome annotation tool.

I would like to annotate an NER dataset and use multiple sessions for different annotators using ner.correct recipe.

Suppose I have 3 sessions that operate on same dataset. Each annotator saves their tasks along the way.

What recipe can I use to view the accepted tasks of each annotator session?

Can each annotator export their accepted tasks per session into an output jsonl file?

Thanks.

ines · May 7, 2021, 2:39am

Hi and thanks! There are different ways you could do this and it depends a bit on how you're dividing up the data. If all annotators are labelling the same examples, you can use Prodigy's review recipe to compare all annotations on the same data and resolve potential conflicts: https://prodi.gy/docs/recipes#review

If you're using named multi-user sessions, the examples will be saved to the main dataset, as well as the session dataset named {dataset}-{session}. So if you're annotating with ?session=rindra and saving to the dataset my_cool_dataset, the session dataset will be my_cool_dataset-rindra and you can export it using db-out or load it via the Python API. The individual examples in the dataset will also expose a _session_id.

Alternatively, if you know how you want to split up your data, you could also have all your annotators save their annotations to separate datasets. This is a lot cleaner so I we often recommend this during development: all sessions are separate, so if there's a problem (e.g. one annotator misunderstanding the scheme), you can just delete the dataset and start over. Merging datasets later is always easy – recipes like review, train and data-to-spacy all support multiple datasets as input

rindranirina · May 7, 2021, 2:51am

Thanks @ines for the clear explanations. I appreciate your help.

The two approaches you suggested would work in my case. I did not know that a session dataset is also accessible for export.

So if I export a dataset under my session with db-out, will it only export the accepted examples under manual recipe? Or will it also export the ignored or rejected examples?

If I wanna use the review recipe later, I want to focus on comparing accepted examples that are in both datasets.

Thanks for your support.

Topic		Replies	Views
multi -session annotation database , streams	5	653	April 9, 2020
Duplicated annotation when changing version ner , spacy	6	556	November 9, 2022
Recipe for comparing NER model and manual annotation usage , ner , custom , compare	4	1404	July 13, 2021
Merging annotations from different datasets usage , ner , database , solved	12	5865	May 28, 2019
Best practices for NER annotation to avoid overfitting usage , ner	3	1358	October 21, 2020

How to verify Multi-annotated gold-standard labels

Related topics