Is it possible to use ner_manual without loading the dataset that you are adding to?

My dataset is fairly large, so loading it each time takes quite a bit of memory. Is there a recipe setting to NOT load the existing dataset and only add/append to it?

Hi! Only appending to the dataset is the default behaviour (via Database.add_examples) – however, when the server starts, the dataset is loaded once, mostly to set the count of already annotated examples. I'll see if we can replace this with a more efficient query :slightly_smiling_face: There really shouldn't be a need to load any of the actual examples.

1 Like

Ooh I get it, that's the count, makes sense. I'll keep an eye out in the changelog :smiley:

Just released v1.9.8, which includes a small adjustment to the startup query so Prodigy doesn't load the individual examples anymore. It still makes a database request, but the startup with a large dataset should hopefully be more efficient now :slightly_smiling_face:

1 Like