Training BERT on prodigy

Mohammad · February 2, 2023, 12:35pm

After I did the annotation on prodigy how I can run training on prodigy for BERT

python3 -m prodigy bert.ner.manual correctd_drsv1 ./model_drsv1 ./CV_json_English_Format.jsonl --label Skill,MaritalStatus,University,Designation,Degree,Experience --tokenizer-vocab ./bert-base-uncased-vocab.txt --lowercase --hide-wp-prefix -F transformers_tokenizers.py

ryanwesslen · February 2, 2023, 1:55pm

Hi @Mohammad!

Technically, you can train however you want (e.g., write a script in PyTorch, TensorFlow, etc.). The annotated data is in a SQLite database (by default) and you can simply export out. Many Prodigy users who use Prodigy's BERT recipe already have their own training workflows.

I suspect what you're looking for a simple way to train with Prodigy. Prodigy's default training recipe train, is really a wrapper for spacy train with simple defaults. The problem is training transformers (BERT) needs a lot of customization, therefore, if you want to train with spaCy, you'll need to familiarize yourself with spaCy training (e.g., config, GPU).

I would recommend reading up on spaCy's documentation on transformer training. You'll likely need to already have knowledge about spaCy training.

Also, you'll likely find more resources on spaCy's GitHub discussions forum like

Please post there if you have spaCy specific questions as the spaCy core team answers questions there.

This FAQ may be helpful too to familiarize yourself with handling GPU's, which is another major thing to learn before training transformers (spaCy or anything other framework).

Ultimately, it's important you take a step back and consider: do you really need transformers (BERT)?

For many projects (especially in industry) the answer is that the performance gains may not justify the additional work that's required, especially if you're starting out from scratch. However, I do think if you spend enough time and have the right expectations, it could be a valuable skill set to develop. But it'll take time!

Mohammad · February 2, 2023, 2:56pm

The task what I want to do to is extracting relationships in CV between skills and employment and what I understand spacy can not fit for this task without help with BERT as spacy can perform relations extraction.

ryanwesslen · February 2, 2023, 3:09pm

Thanks for the background!

Not necessarily. You can train with CPU (non-transformers) or with GPU (transformers, aka BERT-like models).

Sofie has a great video where she outlines both approaches:

She also has the repo as a spaCy project, which include both a command for cpu_train and one for gpu_train:

I would recommend watching her video carefully and following her project and modifying it as you need.

Topic		Replies	Views
BERT support for prodigy train ner usage , ner , spacy , solved	2	1023	June 30, 2021
transformers model for NER ner , spacy	6	401	October 31, 2023
config.cfg for bert.ner.manual usage , ner , transformers	5	821	September 30, 2022
How to train the annotated file after using bert.ner.manual to label it ? usage , ner , solved , transformers	1	863	August 6, 2020
How to do relation annotation after using bert.mer.manual transformers , relations	2	366	December 12, 2023

Training BERT on prodigy

Related topics