I m doing hyperparameter tuning on a NER model, whose dataset has been annotated with Prodigy. I was training with prodigy train recipe, in order to make models for ner.correct (what an useful feature, thank you a lot). Now that I have a good ammount of annotation (maybe), I am transfering the work to Spacy, which I think has more fine tuning options. I would like to know which parameters prodigy uses for training ner models. I am trying to reproduce the prodigy training from Spacy, in order to have a baseline performance, which I would try to improve by hyperparameter tuning.
For example, at https://spacy.io/usage/training#tips-batch-size there is a lot of amazing tips for optimizing the training. Does prodigy uses any of them? I know it uses batch-size compounding (as fair as I can infer by reading the documentation), but what are the init, end and factor? What about the dropout decaying. Does prodigy uses it? any special param for learning rate, regularization, gradient clipping?
Is there a way to activate gpu training for prodigy? Does it make any difference in accuracy?
Thank you for this amazing tool.