Is there a faster way to add records to a prodigy db than "add_examples"?

araykhel · March 20, 2019, 7:25pm

I’m trying to add records to my prodigy db using add_examples. It takes like 20 mins to add 300 “records” (already in the prodigy format, which is why I’m using add_examples), and I have one list of records that is 30k long, soooo that’s going to take a while. I’m at a loss how to add it faster; does someone with more knowledge about the prodigy database structure have an idea?

andy · March 22, 2019, 1:16pm

I was just doing this this morning. If you have your examples in a Prodigy-formatted JSONL, you can use the db-in command to add them to a dataset. You’ll need to make sure the dataset already exists in prodigy (prodigy dataset etc), and then you can run

prodigy db-in [dataset] [in_file]

There’s more documentation in the README.
I added 3,000 annotations in about 2 seconds, so it should work just fine with your 30k. Weird that add_annotations is so slow. I’m not sure why that’s the case.

araykhel · March 25, 2019, 3:29pm

Thanks, Andy! I’ll use that. I know part of my problem was that I wasn’t specifying a list of datasets, just a string name (“dataset” instead of [“dataset”]), which gets interpreted as each letter of the string being the name of a dataset. eyeroll But even once I fixed that, it was faster but not as fast as you’re saying! So yeah, I’ll try that.

ines · March 25, 2019, 3:59pm

Ah, damn – but good thing you caught it! Internally, the method iterates over the value of datasets and only fails if it's not iterable (but a string obviously is, too). We can add a check for this in Prodigy though so it raises an error if the value isn't a list or tuple!

araykhel · March 25, 2019, 4:38pm

Yeah, that’d be a nice check! I figured it out finally when I was double checking which datasets were in my database and saw “d”, “a”, “t”, “a”…

Topic		Replies	Views
Load dataset from recipe usage , database , solved	6	1711	October 15, 2018
Adding new data to be annotated without re-starting the server usage , database	10	246	November 3, 2023
How do tables map to datasets in prodigy DB? database , solved	2	733	December 13, 2019
Annotate multiple JSONL into multiple Datasets usage , database , solved , streams	2	550	October 7, 2021
Uploading a dataset of images usage , database , image , solved	6	1357	June 21, 2019

Is there a faster way to add records to a prodigy db than "add_examples"?

Related topics