Increase the maximum length of the ner training

alvaro.marlo · August 3, 2021, 8:27am

Hi everyone,

I'm doing a training with the command prodigy train ner and I receive this error:

ValueError: [E088] Text of length 2227606 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the 'nlp.max_length' limit. The limit is in number of characters, so you can check whether your inputs are too long by checking 'len(text)'.

My question is how can I increase this maximum length for the training? Or is it better removing the texts longer than 1000000 characters?

lnatprodigy · August 3, 2021, 10:13am

I'm not sure if some that isn't part of the Prodigy/Spacy-Team is supposed to answer in this forum but here it goes...

You usually don't want such long examples for training. It's my understanding that the model will only consider local context anyway so providing all that text at once does you no good.
Even if you want to have the finished model annotate longer texts, you should probably try to keep your training samples to some reasonable length, like sentences or short paragraphs.

This is also hinted at here https://prodi.gy/docs/named-entity-recognition#long-text .

In terms of how you change the nlp object for training see here https://spacy.io/usage/training#custom-code

I don't think there is an option to raise this particular limit in the config file.

adriane · August 3, 2021, 10:29am

Yes, it would be best to break your texts up into smaller units for training. For NER, we'd normally recommend paragraph-sized texts up to maybe a page or two long, like a document section. Usually context beyond the current paragraph is not useful for the NER predictions.

With smaller texts, the memory usage is a lot lower and it's easier to batch and shuffle while training, which can also improve the results.

Topic		Replies	Views
Is there a limitation for string length for NER spacy models? usage , ner , spacy	1	1505	October 31, 2018
Prodigy NER train recipe getting killed for no apparent reason	9	774	December 4, 2022
ner.correct memory usage usage , ner , done , solved , streams	9	709	November 19, 2020
Command "ner.batch-train" returns MemoryError ner , solved	5	827	August 22, 2019
thinc.neural.ops.Ops.allocate MemoryError.! thinc	2	1473	July 30, 2018

Increase the maximum length of the ner training

Related topics