Prodigy NER train recipe getting killed for no apparent reason

dave-espinosa · June 17, 2022, 6:14pm

Hello everyone,

I am about to train a NER model in Prodigy, for which I have 6 datasets available (they could be more), obtained through ner.manual. Some aspects about this data:

Each file has 1000 samples.
The text in each sample is "lengthy": has ~3500 characters or ~450 words in average (I know that smaller texts would be better, but for my application, I need them to remain as large as currently shown).
4 labels are being recognized.

Then I use train command to start with the training, to suddenly stop with the following (meaningless) message:

E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
Killed

BTW, when I reduce the training data to 5 datasets or less, training goes on normally. I was guessing some memory issue, and this post seems to confirm it; however that post does not explain clearly about what to do, to diagnose and confirm such problem (Ines suggestion only extends a list, while Guillaume quickly mentions psutil library to confirm a memory issue, however not showing any code snipet).

What should I do?

koaning · June 20, 2022, 7:55am

It's hard to say for sure, but given that one dataset less works fine I'm indeed guessing it's a memory issue.

Are you able to export the datasets to the .spacy format via the data-to-spacy recipe? If so, we might be able to pick it up from spaCy.

dave-espinosa · June 22, 2022, 10:03pm

Hello @koaning ,

I got to realize a couple of things:

data-to-spacy managed to build the .spacy files required for training in spaCy.
When training the model however, I faced up again that Killed message.

Having the .spacy files already generated however, I decided to move to another cloud-based VM, with more computational resources in regards of memory this time... And the training completed successfully. That indirectly explains the root cause of my issue.

Still, it would be awesome to have some updated code snipet to diagnose this problem (i.e., a snipet which can tell if you are actually running short of memory or not for your training dataset[s]), and some suggestions to avoid this problem for "big" training datasets.

Thank you.

koaning · June 23, 2022, 10:01am

When you're training locally, you can pass a custom config.cfg file to train a spaCy model. This has a few parameters that might be worth exploring further. This allows you to pick smaller weights, which could help, but this setting might be most useful:

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

It could be that the batch size is too big for your machine. So you can set it to be much smaller.

dave-espinosa · June 23, 2022, 4:37pm

Hello @koaning ,

Thank you, I'll put an eye to it. In general, is there any segment in the documentation, where all the parameters in the config.cfg file are detailed / explained? (i.e. in you code segment, which variable controls the batch size for instance?). The only piece of information I have found so far is this, this and more informally this, but even when they're nicely documented for aspects closely related with spaCy architecture, it misses some other aspects more related with the modeling itself.

It would be awesome to know if I am missing some other section in the doumentation, that could add more info regarding that.

Best regards.

koaning · June 24, 2022, 7:30am

I understand where you're coming from. The config.cfg can be a bit intimating just because there are so many settings in ML models these days.

I usually rely on Model Architectures section on the spaCy docs to understand the hyperparameters a bit better. There's some ideas for better educational content in this domain, but for now that part of the docs is the best reference for understanding all the settings.

rahul1 · November 16, 2022, 7:32am

Hi @dave-espinosa

I am facing similar problem of memory while creating model with NER train.

My database with annotated data is relatively small, 200 MB. I have never used VM before for computing.

Can you share which cloud-based VM you used and how do you set to use prodi.gy in such environment?

If you can share your experience, that would be great help.

regards
Rahul

dave-espinosa · November 16, 2022, 2:09pm

Hello @rahul1 ,

Allow me to answer your questions:

My company uses products related with Google Cloud Platform; in specific for VMs, we usually use either Compute Engine (my current choice for Prodigy) or Vertex AI Workbench. I think Google grants new users USD 300 in credits, which by own experience, allows you to run ~3 months worth of experiments (Important to say Google charges you by hourly rate, so my estimation might greatly fluctuate, depending on the intensity of your own experiments).

I used Prodigy official documentation, for Installation & Setup.

Hope it helps, and sorry about the delay!

rahul1 · November 18, 2022, 8:22am

Hi @dave-espinosa
Thank you very much for the information.
I will go for this.
regards
Rahul

rahul1 · December 4, 2022, 2:59pm

Hi @dave-espinosa,

It worked! Thanks for the help. I used Compute Engine with 64 GB RAM and ubuntu boot disk. Inside the VM instance ssh, the installation of prodigy is similar to any local ubuntu laptop.

gr. Rahul

Topic		Replies	Views
ner.batch-train is really slow ner	21	2392	April 4, 2018
Command "ner.batch-train" returns MemoryError ner , solved	5	827	August 22, 2019
Large Datasets Google Cloud usage , ner , google-cloud	5	1818	October 13, 2018
ner.correct memory usage usage , ner , done , solved , streams	9	711	November 19, 2020
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1213	July 2, 2020

Prodigy NER train recipe getting killed for no apparent reason

Related topics