Hi, I've noticed that the db-in
, db-merge
and drop
operations are running slowly for me. Importing a data set of 20000 annotations (25mb) takes more than 20 minutes. In contrast, db-out
takes just seconds.
I'm using PostgreSQL 13.3 on an r6g.large instance on AWS, which does not receive other queries during the operations. So it's likely not a hardware constraint.
Looking at the currently running queries, it seems that each example is added/deleted individually, meaning the database has to work through 20k requests.
- Are there any database-related settings that I have overlooked? I checked the prodigy.json docs and the psycopg2 docs
- Is it possible to do a bulk operation with Python rather than the CLI?