Feature Request: Bulk Dataset Drop

enhancement
database

(W.P. McNeill) #1

While experimenting with Prodigy I find myself creating and dropping lots of datasets. Every once in a while I go through and clean up every dataset. This is a little tedious because prodigy drop only works on one dataset at a time.

It would be nice to have ways to make this faster. prodigy drop takes multiple arguments, or maybe patterns to match the names of datasets.


(Andy Halterman) #2

You could use the drop function in Prodigy’s __main__.py and apply it over a list with something like this (WARNING, untested):

from prodigy.components.db import connect
DB = connect()
# could take as a plac input
to_drop = "db1,temp1,test4"
to_drop = [i.strip() for i in to_drop.split(",")]

# [Copy this function from __main__.py, line 132]
def drop(set_id):
    """
    ...
    """

for db in to_drop:
    drop(db)

(Ines Montani) #3

@andy That’s a nice idea actually!

Alternatively, Prodigy’s Database also has a drop_dataset method – this is a little more direct, but won’t give you any warnings or print any results.

db = connect()
for dataset in ['db1,' 'temp1', 'test4']:
    db.drop_dataset(dataset)

I’ve also been thinking about building a little app that lets you view (and potentially manage) datasets in the browser. Like, a “Prodigy Dataset Explorer”. We wouldn’t necessarily ship this with the library, but it could be a nice open-source addon that users could install and contribute to :slightly_smiling_face:


(Justin Du Jardin) #4

I’ve got something like that as an Electron app, if you’re using sqlite. I’ve been using it for reviewing and updating annotations on my project, and as a scratch pad for making programmatic changes to my db. That’s what the beautiful “custom” button does on the top right.

I pushed it to Github, in case it’s useful for other people: https://github.com/justindujardin/prodigy-viewer


(Ines Montani) #5

@justindujardin Oh wow, this is amazing!!! It makes me so happy to see all the cool stuff you and others are building with and for Prodigy :yellow_heart: (We should probably start compiling a list of all addons, scripts and custom recipes soon. Luckily, the prodigy topic on GitHub is unoccupied, which is pretty nice!)

Btw, in terms of the database connection: Not sure how easy it would be to integrate something like this, but in theory, the app could also communicate with Prodigy’s database via a REST API. For example:

DB = prodigy.components.db.connect()

@hug.get('/dataset/{dataset_id}')
def get_dataset(dataset_id):
    examples = DB.get_dataset(dataset_id)
    return {'examples': examples}

To make this work more smoothly, we probably need a few additional database methods, though (like update_example etc).


(Justin Du Jardin) #6

Yeah, that’d be much better! I’ve been putting all the sqlite specific stuff in an angular service, so it should be pretty easy to swap out.

The public API of the sqlite service class is probably a good reference for that kind of stuff. You may be right that it could all boil down to an update_example endpoint. :sweat_smile: