While experimenting with Prodigy I find myself creating and dropping lots of datasets. Every once in a while I go through and clean up every dataset. This is a little tedious because prodigy drop only works on one dataset at a time.
It would be nice to have ways to make this faster. prodigy drop takes multiple arguments, or maybe patterns to match the names of datasets.
You could use the drop function in Prodigy’s __main__.py and apply it over a list with something like this (WARNING, untested):
from prodigy.components.db import connect
DB = connect()
# could take as a plac input
to_drop = "db1,temp1,test4"
to_drop = [i.strip() for i in to_drop.split(",")]
# [Copy this function from __main__.py, line 132]
def drop(set_id):
"""
...
"""
for db in to_drop:
drop(db)
Alternatively, Prodigy’s Database also has a drop_dataset method – this is a little more direct, but won’t give you any warnings or print any results.
db = connect()
for dataset in ['db1,' 'temp1', 'test4']:
db.drop_dataset(dataset)
I’ve also been thinking about building a little app that lets you view (and potentially manage) datasets in the browser. Like, a “Prodigy Dataset Explorer”. We wouldn’t necessarily ship this with the library, but it could be a nice open-source addon that users could install and contribute to
I've got something like that as an Electron app, if you're using sqlite. I've been using it for reviewing and updating annotations on my project, and as a scratch pad for making programmatic changes to my db. That's what the beautiful "custom" button does on the top right.
@justindujardin Oh wow, this is amazing!!! It makes me so happy to see all the cool stuff you and others are building with and for Prodigy (We should probably start compiling a list of all addons, scripts and custom recipes soon. Luckily, the prodigy topic on GitHub is unoccupied, which is pretty nice!)
Btw, in terms of the database connection: Not sure how easy it would be to integrate something like this, but in theory, the app could also communicate with Prodigy’s database via a REST API. For example:
The public API of the sqlite service class is probably a good reference for that kind of stuff. You may be right that it could all boil down to an update_example endpoint.
I'm a little bit wary of supporting a drop --all here, mainly because it can lead to a database loosing all of it's data. Unless everyone has proper backups, supporting this might cause dramatic accidents to take place.
If you really want to delete all files, you can also choose to manually delete the Sqlite database file locally. It's not something I recommend, because there's a risk of loosing data. But that would really delete it all.
Another option could also be to use bash directly. This would allow you to delete all tables by listing all the names in a file.
Suppose that you have this file called names.txt:
name-a
name-b
name-c
Then you could run:
cat names.txt | while read line; do prodigy drop "$line"; done