Training BERT on prodigy

Hi @Mohammad!

Technically, you can train however you want (e.g., write a script in PyTorch, TensorFlow, etc.). The annotated data is in a SQLite database (by default) and you can simply export out. Many Prodigy users who use Prodigy's BERT recipe already have their own training workflows.

I suspect what you're looking for a simple way to train with Prodigy. Prodigy's default training recipe train, is really a wrapper for spacy train with simple defaults. The problem is training transformers (BERT) needs a lot of customization, therefore, if you want to train with spaCy, you'll need to familiarize yourself with spaCy training (e.g., config, GPU).

I would recommend reading up on spaCy's documentation on transformer training. You'll likely need to already have knowledge about spaCy training.

Also, you'll likely find more resources on spaCy's GitHub discussions forum like

Please post there if you have spaCy specific questions as the spaCy core team answers questions there.

This FAQ may be helpful too to familiarize yourself with handling GPU's, which is another major thing to learn before training transformers (spaCy or anything other framework).

Ultimately, it's important you take a step back and consider: do you really need transformers (BERT)?

For many projects (especially in industry) the answer is that the performance gains may not justify the additional work that's required, especially if you're starting out from scratch. However, I do think if you spend enough time and have the right expectations, it could be a valuable skill set to develop. But it'll take time!