Loading pre-annotated data

cheyanneb · September 20, 2021, 3:53pm

Hi. I would like to load a pre-annotated (outside of prodigy) dataset. Is this possible to see the label already selected and then view the changelog?

ines · September 21, 2021, 6:49am

Prodigy's input and output formats are identical, so you can always load in already annotated data, or generate the JSONL format programmatically. You can see an overview of the expected formats for the different interfaces here: https://prodi.gy/docs/api-interfaces

For example, if you're using the choice UI with multiple options, you can provide a list of "accept": ["LABEL_A", "LABEL_B"] to pre-select options in the UI. This will then be updated if you make changes in the UI. If you want to preserve the original annotations, you can just add them as a separate key to the JSON you send out, e.g. "orig_selection": [...]. This is passed through with the data, so to find out whether the annotations have changed, you just need to compare accept and orig_selection.

cheyanneb · October 17, 2022, 12:32pm

@ines I have a follow-up question here. I loaded some pre-annotated data that I want to re-annotate and do some error analysis on using this command, which seemed to load the data.

(prodigy) cheyannebaird@Cheyannes-MacBook-Pro:~/.prodigy$ prodigy db-in hbm_error_analysis /path_to_data/data.jsonl
✔ Imported 170 annotations to 'hbm_error_analysis' (session
2022-10-17_08-23-40) in database PostgreSQL

How do I then load this into the UI with the highlighted annotated label?

How I normally load the data locally:

PRODIGY_ALLOWED_SESSIONS=cheyanne PRODIGY_LOGGING=verbose prodigy recipe-name hbm_error_analysis  /path_to_data/data.jsonl -F /path_to_recipe/my_recipe.py

Screen Shot 2022-10-17 at 8.30.13 AM

koaning · October 24, 2022, 9:11am

A small detail: there's no need to @ a forum member for support. We all round robin the issues and cannot guarantee that the same person picks up the ticket/thread. Also: apologies for the delay! Life with a newborn sure is hectic.

Am I correct to see that you're using a custom recipe here?

If so, your recipe-name recipe currently uses a reference to a /path_to_data/data.jsonl file. Internally in your recipe, I'm assuming it uses something like below to load in the examples:

from prodigy.components.loaders import JSONL 

stream = JSONL(data_path)

But if you want to pass in the name of a dataset that already exists, you could do something like:

from prodigy.components.db import connect

dataset_name = "hbm_error_analysis"
db = connect()                                  # uses settings from prodigy.json
stream = db.get_dataset(dataset_name)     # retrieve a dataset

Would this work?

Topic		Replies	Views
error while loading pre-annotated jsonl file usage , textcat , solved	9	538	March 29, 2023
Loading message prodigy UI usage , solved	7	784	September 12, 2019
Cant load pre-annotated ner jsonl usage , ner , solved	8	1182	June 24, 2020
How to visualise annotated images with corresponding label after annotation finishes in prodigy? usage , image	1	891	March 8, 2019
Loading from existing datasets error usage , image , streams	1	420	February 10, 2021

Loading pre-annotated data

Related topics