Is it possible to use ner_manual without loading the dataset that you are adding to?

araykhel · February 26, 2020, 6:29pm

My dataset is fairly large, so loading it each time takes quite a bit of memory. Is there a recipe setting to NOT load the existing dataset and only add/append to it?

ines · February 27, 2020, 2:27pm

Hi! Only appending to the dataset is the default behaviour (via Database.add_examples) – however, when the server starts, the dataset is loaded once, mostly to set the count of already annotated examples. I'll see if we can replace this with a more efficient query There really shouldn't be a need to load any of the actual examples.

araykhel · February 27, 2020, 4:47pm

Ooh I get it, that's the count, makes sense. I'll keep an eye out in the changelog

ines · March 14, 2020, 6:53pm

Just released v1.9.8, which includes a small adjustment to the startup query so Prodigy doesn't load the individual examples anymore. It still makes a database request, but the startup with a large dataset should hopefully be more efficient now

Topic		Replies	Views
Adding new data to be annotated without re-starting the server usage , database	10	244	November 3, 2023
Old examples are automatically added to new dataset done , database	15	2042	March 25, 2019
Loading a dataset from the DB instead of from disk/api? usage , solved	4	1972	March 6, 2018
Make Prodigy "forget" the answers on data import usage , database , solved	2	533	November 4, 2020
when to use db-in vs ner.manual usage , ner , database , solved	1	426	October 2, 2020

Is it possible to use ner_manual without loading the dataset that you are adding to?

Related topics