Editing datasets

ines · October 25, 2017, 1:05pm

There are currently the following commands to interact with the datasets in the database:

prodigy db-in: Import annotations to a dataset.
prodigy db-out: Export annotations from a dataset or session.
prodigy drop: Remove a dataset or session from the database.

To view the datasets and sessions (each individual annotation session, named after the timestamp), you can use the prodigy stats command:

prodigy stats -l    # view stats and list all datasets
prodigy stats -ls  # view stats and list all datasets and sessions

We didn’t want to add too many arbitrary, Prodigy-specific commands to interact with the database at this point, because it easily gets messy and we weren’t sure how much the users would actually really need. So for now, if you want to rename an existing dataset, or change the description, you’d have to export and re-add it:

prodigy db-out my_set /tmp
prodigy db-in my_new_set /tmp/my_set.jsonl "Some description"
prodigy drop my_set  # optional: delete dataset

You can also preview an existing dataset on the command line using ner.print-dataset (example output) and textcat.print-dataset (example output). If the dataset is large, I’d recommend using less so you can navigate through them (with the -r flag to make sure the colors are displayed correctly):

prodigy ner.print-dataset news_headlines | less -r

Topic		Replies	Views
How do we inspect dataset sessions? usage , database , solved	3	2035	August 9, 2018
Delete annotation from dataset/database usage , database	1	1858	January 15, 2019
How to edit existing texts that were added to a dataset using db-in ner , database	3	1073	February 3, 2020
Reviewing/Editing annotated data usage , review , streams	1	946	June 23, 2020
Feature request: a recipe to print the names of all your datasets database , solved	3	1992	April 14, 2020

Editing datasets

Related topics