I m doing hyperparameter tuning on a NER model, whose dataset has been annotated with Prodigy. I was training with prodigy train recipe, in order to make models for ner.correct (what an useful feature, thank you a lot). Now that I have a good ammount of annotation (maybe), I am transfering the work to Spacy, which I think has more fine tuning options. I would like to know which parameters prodigy uses for training ner models. I am trying to reproduce the prodigy training from Spacy, in order to have a baseline performance, which I would try to improve by hyperparameter tuning.
For example, at https://spacy.io/usage/training#tips-batch-size there is a lot of amazing tips for optimizing the training. Does prodigy uses any of them? I know it uses batch-size compounding (as fair as I can infer by reading the documentation), but what are the init, end and factor? What about the dropout decaying. Does prodigy uses it? any special param for learning rate, regularization, gradient clipping?
Is there a way to activate gpu training for prodigy? Does it make any difference in accuracy?
When training a ner model with spacy, it reports the best F-Score. Does the final saved model reflect this best F1 Score, or it is the one produced by the latest epoch/iter?
Hi! Prodigy's train command is basically a thin wrapper around spaCy's training API that's optimised for quick experiments. It takes care of loading your data created with Prodigy, merging annotations from multiple datasets and across different examples, and evaluating on a separate dataset or a random split. It uses the default configuration and let you customise some settings, like the dropout.
Once you get serious about training your final model, you probably want to train in spaCy directly so you have more control over the details of the training workflow. You can use Prodigy's data-to-spacy command to export your annotations to spaCy's format. It even supports annotations for multiple tasks, so you can combine annotations for NER and text classification and train both components together. (The upcoming spaCy v3 will also make it much easier to work with and customise fine-grained details like hyperparameters and model settings.)
But if you're in the middle of developing your data and annotation scheme, are trying out different things and want to see if you're on the right track, Prodigy's train command can be useful to give you a quick and simple overview. Typically, the questions you want to answer at this stage are things like "Can my model learn this?" or "Is this approach working significantly better than the other one?". For that, a quick and easy experiment is often enough to point you in the right direction.