Training NER does not make any progress

dope · December 14, 2021, 8:39pm

Hi all,

I just started using Prodigy for my research and I love it. I am updating the NER for some labels (mainly products) to better fit my dataset.
For that I have extracted a few random sentences from my text and annotated them - just to try whether my flow will ultimately work.

Step 1 was successful:

Python3 -m prodigy ner.correct ner_news_test en_core_web_trf ./df_test_RandomSentence.jsonl --label DATE,ORG,PRODUCT

Unfortunately, step 2 was not successful:

Python3 -m prodigy train --ner ner_news_test --eval-split 0.2 --base-model en_core_web_trf --lang en --label-stats TRUE

Here I get results up to this point:

============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 12 | Evaluation: 3 (20% split) Training: 12 | Evaluation: 3
 Labels: ner (3)
 ℹ Pipeline: ['transformer', 'tagger', 'parser', 'attribute_ruler',
 'lemmatizer', 'ner']
 ℹ Frozen components: ['tagger', 'parser', 'attribute_ruler',
 'lemmatizer']
 ℹ Initial learn rate: 0.0
 E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
 ---  ------  -------------  --------  ------  ------  ------  ------
  0       0          85.51     43.87   33.33   33.33   33.33    0.33

Afterwards nothing happens, it doesn't stop but also there is no progress but my CPU is pretty much maxed out - even though I only have a few additional sentences in here. I'm running it on an M1 MacBook Pro, so it should work?

Any help would be highly appreciated!

ljvmiranda921 · December 15, 2021, 10:22am

Hi @dope, welcome to Prodigy!

This might actually be your CPU running out of memory. For a transformer model like en_core_web_trf, I suggest using a GPU—it should alleviate that process. You might also want to play around your training.batch_size configuration just in case.

dope · December 15, 2021, 2:31pm

Hi, thank you for your reply! I believe GPU does not work with an M1 Mac yet (since Spacy uses Cuda, which is focused on Nvidia?)...

I tried modifying my code to add the batch size (I set it to 128, I am not sure which values are reasonable to try out here?) but got an error message:

✘ Config validation error
training -> batch_size   extra fields not permitted

Any idea how I could proceed?

Since this is part of my PhD research, I would like to have the most accurate model possible - hence I use en_core_web_trf as a basis.
If I use en_core_web_lg the training seems to make progress but I would really like to get en_core_web_trf to work.

ljvmiranda921 · December 16, 2021, 12:56am

Hmm, yeah, this means more like a separate GPU machine. Transformers are usually large and may not be a good fit for a laptop (e.g., I even tried a gaming laptop). If you can configure it using a cloud service (like just through a Virtual Machine in GCP/AWS/Azure with GPU), you may be able to train one.

I believe you should be able to do this via the Prodigy CLI itself. You can overwrite the values for training.batcher.size.start and training.batch.er.size.stop. Something like this:

prodigy train output ... --training.batcher.size.start=35 --training.batcher.size.stop=50

Highly suggest checking this thread (Flag --batch-size not recognized by prodigy train - #2 by SofieVL) out for more info on the config!

Topic		Replies	Views
How to use GPU to accelerate the train of NER tasks? training	5	2438	August 25, 2021
Commands for training NER-Model in prodigy usage , ner , solved , training	9	1120	January 9, 2023
Running on CPU usage , solved , training	4	1477	February 3, 2022
Issues with ner.batch-train with en_trf_bertbaseuncased_lg after creating a custom set of labels enhancement , usage , ner , solved , transformers	1	1170	October 14, 2019
Prodigy NER train recipe getting killed for no apparent reason	9	765	December 4, 2022

Training NER does not make any progress

Related topics