Editing datasets

There are currently the following commands to interact with the datasets in the database:

  • prodigy db-in: Import annotations to a dataset.
  • prodigy db-out: Export annotations from a dataset or session.
  • prodigy drop: Remove a dataset or session from the database.

To view the datasets and sessions (each individual annotation session, named after the timestamp), you can use the prodigy stats command:

prodigy stats -l    # view stats and list all datasets
prodigy stats -ls  # view stats and list all datasets and sessions

We didn’t want to add too many arbitrary, Prodigy-specific commands to interact with the database at this point, because it easily gets messy and we weren’t sure how much the users would actually really need. So for now, if you want to rename an existing dataset, or change the description, you’d have to export and re-add it:

prodigy db-out my_set /tmp
prodigy db-in my_new_set /tmp/my_set.jsonl "Some description"
prodigy drop my_set  # optional: delete dataset

You can also preview an existing dataset on the command line using ner.print-dataset (example output) and textcat.print-dataset (example output). If the dataset is large, I’d recommend using less so you can navigate through them (with the -r flag to make sure the colors are displayed correctly):

prodigy ner.print-dataset news_headlines | less -r
11 Likes