Prodigy is slow at loading annotations

ines · July 22, 2021, 7:15am

One option could be to split your JSONL into smaller files, yes. Once you're done with file 1, you can start at file 2, and if you ever restart the server, it'll only have to go through the examples in that file again. The hackier version of this would be to just remove all lines from the top of your JSONL that you know are already annotated in the data.

Also, if you want to really optimize for processing performance, another option could be to do the pre-processing separately, e.g. have a script that runs your spaCy model over your input texts and adds the "tokens" and "spans". You can then run that on a remote machine, potentially even with a GPU, parallelize it etc. and output a static JSONL file that already has everything you need. You can then use that with ner.manual or a custom recipe that only streams in exactly what's in the data and doesn't do any pre-processing.

Prodigy doesn't set n_process on nlp.pipe by default, but you could try and add that in the recipe to see if it makes a difference. (You can run prodigy stats to find the location of your local installation and then just hack it into the ner.correct function.) But I think the absolute fastest solution would be the pre-processing approach I described above.

Topic		Replies	Views
documents length and annotation time usage , ner , solved , streams	13	947	December 4, 2020
Prodigy crashes on large documents ner , spacy	1	1105	January 16, 2018
Loading message prodigy UI usage , solved	7	787	September 12, 2019
ner.teach very slow ner	7	1350	June 27, 2018
Prodigy says "No tasks available." usage , solved , streams	14	909	October 7, 2021

Prodigy is slow at loading annotations

Related topics