My dataset is fairly large, so loading it each time takes quite a bit of memory. Is there a recipe setting to NOT load the existing dataset and only add/append to it?
Hi! Only appending to the dataset is the default behaviour (via
Database.add_examples) – however, when the server starts, the dataset is loaded once, mostly to set the count of already annotated examples. I'll see if we can replace this with a more efficient query There really shouldn't be a need to load any of the actual examples.
Ooh I get it, that's the count, makes sense. I'll keep an eye out in the changelog
Just released v1.9.8, which includes a small adjustment to the startup query so Prodigy doesn't load the individual examples anymore. It still makes a database request, but the startup with a large dataset should hopefully be more efficient now