Model registry

After training a new model or updating a model through active learning we would like to store the output in a model registry. Currently we are looking into Azure ML or MLFlow registry. Is it possible to publish a model to any of these services or does someone have a suggestion.
We would also like to store metadata about the model such as performance, on which dataset and version of the dataset it has been trained, if active learning was used which model was used as input and which user has been working on the model.
Then we would also like to fetch models from this registry back into Prodigy for new annotations.

Hi! This is a nice idea and there are two features of spaCy models that hopefully make this easier:

  • When you save a spaCy model to disk, the nlp.meta (including details about the model and pipeline) will be serialized to disk as the meta.json. nlp.meta is writable, so you can add any custom properties to it that will be saved with the model. You can also write to a model's meta.json directly to store any meta information with your models.

  • spaCy models can be packaged as regular versioned Python packages using the spacy package command. Packaging a model gives you a .tar.gz archive that you can pip install like any other Python package. In spaCy / Prodigy, you can then load an installed model using its name (which is also how spaCy's pretrained models work under the hood, btw). You could even run your own company-internal PyPi index for your model packages and download/install models like this: pip install en_your_custom_model --index-url http://your-secret-index.com.

2 Likes