I imported a dataset with 4M annotations, which is stored on an AWS db instance and accessed from an AWS EC2 instance with 16GB memory. Other smaller datasets work fine. However even simple operations on that dataset tend to do bad things to the machine, even when nothing else is running. The process takes many hours without completing, the terminal goes unresponsive to Ctrl-C and another shell cannot establish an SSH connection.
Is it conceivable that
prodigy stats the-big-data-set could run the instance out of memory? Or the db connection time out? Is it worth trying to use the python db api to probe deeper? I'd be happy to delete the dataset, but
prodigy drop the-big-data-set also shows the problem.
Having insight into these possible practical limitations would be valuable. Can people share the size of their largest datasets, please?
============================== Prodigy Stats ==============================
Prodigy Home /prodigy
Python Version 3.8.1
Database Name MySQL
Database Id mysql
Total Datasets 15
Total Sessions 51