BERT support for prodigy train ner

Aziz · June 29, 2021, 7:01pm

I' m trying to train a NER model after manually annotating 2k chunks. I'm using spacy v2.2 with prodigy 1.10 so in this version of spacy using the command prodigy train ner I can only train spacy pipelines (en core web lg/sm), if I want to train Bert I would have to use new tokenization script (or if I can convert from prodigy produced jsonl to Bert compatible format) and use external tools like hugging face or transformers. There's no BERT version for prodigy train ner ?

ines · June 30, 2021, 3:51am

Hi! The prodigy train command is designed for quick training experiments with spaCy but you can always export your data and then train using a different library, e.g. PyTorch directly. There's no single "BERT-compatible format" – it really just depends on the model you want to train on top of the transformer weights and what it needs to predict. The JSONL will give you the annotations, including the text and the spans, which you can then use to update your model.

With spaCy v3 and the upcoming Prodigy (currently available as a nightly pre-release), you can also train spaCy pipelines initialised with transformer weights like BERT. That said, training with a transformer needs a GPU, so you typically want to export your annotations with data-to-spacy and then train with a transformer-based config on a separate GPU machine.

Running a quick experiment without a transformers can still give you the same useful insights, though, so it's often a good idea to try that first: if there's a problem with your data and your model isn't learning anything, you usually want to fix that first – even with better embeddings, your model will never be as good as it could be. If your model is doing well and you get good results, you know that initialising it with transformer embeddings will likely give you a good boost in accuracy.

Aziz · June 30, 2021, 2:27pm

thanks a lot Ines

Topic		Replies	Views
Training BERT on prodigy transformers , relations	3	831	February 2, 2023
Prodigy ner.batch-train vs Spacy train usage , spacy , best-practices	13	3523	June 2, 2020
config.cfg for bert.ner.manual usage , ner , transformers	5	842	September 30, 2022
BERT recipe when using transformer in pipeline? spacy , solved	8	1928	May 21, 2021
Training prodigy ner data through spacy usage , ner , spacy , solved	3	910	January 8, 2020

BERT support for prodigy train ner

Related topics