Training NER does not make any progress

Hi all,

I just started using Prodigy for my research and I love it. I am updating the NER for some labels (mainly products) to better fit my dataset.
For that I have extracted a few random sentences from my text and annotated them - just to try whether my flow will ultimately work.

Step 1 was successful:

Python3 -m prodigy ner.correct ner_news_test en_core_web_trf ./df_test_RandomSentence.jsonl --label DATE,ORG,PRODUCT

Unfortunately, step 2 was not successful:

Python3 -m prodigy train --ner ner_news_test --eval-split 0.2 --base-model en_core_web_trf --lang en --label-stats TRUE

Here I get results up to this point:

============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 12 | Evaluation: 3 (20% split) Training: 12 | Evaluation: 3
 Labels: ner (3)
 ℹ Pipeline: ['transformer', 'tagger', 'parser', 'attribute_ruler',
 'lemmatizer', 'ner']
 ℹ Frozen components: ['tagger', 'parser', 'attribute_ruler',
 'lemmatizer']
 ℹ Initial learn rate: 0.0
 E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
 ---  ------  -------------  --------  ------  ------  ------  ------
  0       0          85.51     43.87   33.33   33.33   33.33    0.33

Afterwards nothing happens, it doesn't stop but also there is no progress but my CPU is pretty much maxed out - even though I only have a few additional sentences in here. I'm running it on an M1 MacBook Pro, so it should work?

Any help would be highly appreciated!

Hi @dope, welcome to Prodigy!

This might actually be your CPU running out of memory. For a transformer model like en_core_web_trf, I suggest using a GPU—it should alleviate that process. You might also want to play around your training.batch_size configuration just in case. :slight_smile:

Hi, thank you for your reply! I believe GPU does not work with an M1 Mac yet (since Spacy uses Cuda, which is focused on Nvidia?)...

I tried modifying my code to add the batch size (I set it to 128, I am not sure which values are reasonable to try out here?) but got an error message:

✘ Config validation error
training -> batch_size   extra fields not permitted

Any idea how I could proceed?

Since this is part of my PhD research, I would like to have the most accurate model possible - hence I use en_core_web_trf as a basis.
If I use en_core_web_lg the training seems to make progress but I would really like to get en_core_web_trf to work.

Hmm, yeah, this means more like a separate GPU machine. Transformers are usually large and may not be a good fit for a laptop (e.g., I even tried a gaming laptop). If you can configure it using a cloud service (like just through a Virtual Machine in GCP/AWS/Azure with GPU), you may be able to train one.

I believe you should be able to do this via the Prodigy CLI itself. You can overwrite the values for training.batcher.size.start and training.batch.er.size.stop. Something like this:

prodigy train output ... --training.batcher.size.start=35 --training.batcher.size.stop=50

Highly suggest checking this thread (Flag --batch-size not recognized by prodigy train - #2 by SofieVL) out for more info on the config!

3 Likes