It's always difficult to decide where to limit the scope of a tool. On the one hand, it's useful to do things in one place rather than assembling a workflow out of many pieces, and so it's tempting to put in features that a high percentage of users will use in their workflows. But on the other hand, it's good for tools to stay more limited, as no one tool can be the best at everything.
For hyper-parameter optimisation, we see this as a topic that's continuing to develop, and it's also one that requires integration into a remote execution environment, because you want to use multiple machines to execute the hyper-parameter search in parallel. We therefore have not implemented any hyper-parameter search into Prodigy. We recommend exploring Polyaxon and Ray as different approaches to hyper-parameter tuning and experiment management. Polyaxon is more of a full-featured environment, while Ray is a smaller tool that gives you primitives to code solutions yourself.
prodigy train command was shaped by similar considerations. We did decide it was worth the convenience to have a simple train command to train directly from the database. But we haven't tried to cover every use-case, and you can easily replace the command with your own scripts (or export your data using
data-to-spacy and train with
spacy train directly). We recommend doing that for many situations, for example running training tasks under automation, which would normally be the right process for production deployments.