While experimenting with Prodigy I find myself creating and dropping lots of datasets. Every once in a while I go through and clean up every dataset. This is a little tedious because prodigy drop only works on one dataset at a time.
It would be nice to have ways to make this faster. prodigy drop takes multiple arguments, or maybe patterns to match the names of datasets.
You could use the drop function in Prodigy’s __main__.py and apply it over a list with something like this (WARNING, untested):
from prodigy.components.db import connect
DB = connect()
# could take as a plac input
to_drop = "db1,temp1,test4"
to_drop = [i.strip() for i in to_drop.split(",")]
# [Copy this function from __main__.py, line 132]
for db in to_drop:
Alternatively, Prodigy’s Database also has a drop_dataset method – this is a little more direct, but won’t give you any warnings or print any results.
db = connect()
for dataset in ['db1,' 'temp1', 'test4']:
I’ve also been thinking about building a little app that lets you view (and potentially manage) datasets in the browser. Like, a “Prodigy Dataset Explorer”. We wouldn’t necessarily ship this with the library, but it could be a nice open-source addon that users could install and contribute to
I’ve got something like that as an Electron app, if you’re using sqlite. I’ve been using it for reviewing and updating annotations on my project, and as a scratch pad for making programmatic changes to my db. That’s what the beautiful “custom” button does on the top right.
@justindujardin Oh wow, this is amazing!!! It makes me so happy to see all the cool stuff you and others are building with and for Prodigy (We should probably start compiling a list of all addons, scripts and custom recipes soon. Luckily, the prodigy topic on GitHub is unoccupied, which is pretty nice!)
Btw, in terms of the database connection: Not sure how easy it would be to integrate something like this, but in theory, the app could also communicate with Prodigy’s database via a REST API. For example: