Model registry

Aleiny · December 19, 2019, 10:23am

After training a new model or updating a model through active learning we would like to store the output in a model registry. Currently we are looking into Azure ML or MLFlow registry. Is it possible to publish a model to any of these services or does someone have a suggestion.
We would also like to store metadata about the model such as performance, on which dataset and version of the dataset it has been trained, if active learning was used which model was used as input and which user has been working on the model.
Then we would also like to fetch models from this registry back into Prodigy for new annotations.

ines · December 19, 2019, 10:52am

Hi! This is a nice idea and there are two features of spaCy models that hopefully make this easier:

When you save a spaCy model to disk, the nlp.meta (including details about the model and pipeline) will be serialized to disk as the meta.json. nlp.meta is writable, so you can add any custom properties to it that will be saved with the model. You can also write to a model's meta.json directly to store any meta information with your models.
spaCy models can be packaged as regular versioned Python packages using the spacy package command. Packaging a model gives you a .tar.gz archive that you can pip install like any other Python package. In spaCy / Prodigy, you can then load an installed model using its name (which is also how spaCy's pretrained models work under the hood, btw). You could even run your own company-internal PyPi index for your model packages and download/install models like this: pip install en_your_custom_model --index-url http://your-secret-index.com.

Topic		Replies	Views
Trained model output folder usage , spacy , solved	1	351	May 13, 2020
Productionising Prodigy datasets spacy , best-practices , azure , deployment	1	22	November 20, 2024
packaging new models spacy	1	1141	July 30, 2018
Record training results enhancement , ner	2	875	February 19, 2018
Docs for packaging a model docs , usage , solved	1	549	November 13, 2019

Model registry

Related topics