train recipe: how to pass the version parameter for the meta.json file

We would like to programmatically pass the version value of the meta.json file. By default it looks like it is always 0.0.0
We can alway modify it post training, but it would be nice to pass it when calling train

hi @nvasil!

Thanks for your question.

You can't pass version (or related meta values) through the CLI. If you wanted to do it programmatically, you could create a script to read in the meta.json and modify those values.

However, as mentioned above, another approach would be to use your meta.json from training and overwrite the versioning with spacy package.

Let's say you ran prodigy train:

python -m prodigy train --ner train_dataset my-model

So my-model would be a folder with model-best and model-last.

You can then run:

# assume you want to use model-best
python -m spacy package /my-model/model-best /my-new-model --version 0.0.1

Note there are other parameters that you can set in the meta.json like --name.

This will now create a new pipeline in my-new-model, with the versioning so that you can load the model in spaCy as:

import spacy

nlp = spacy.load("/my-new-model/en_pipeline-0.0.1/en_pipeline/en_pipeline-0.0.1")

The pipeline name is a default name used when you don't specify --name in packaging.

You can find more details about this process in the spacy package docs.

While prodigy train is helpful because it's simple, if you're thinking more about advanced training, I would recommend that you consider learning more about spaCy pipelines including how to handle config files and developing pipelines. prodigy train is abstracting away many of these steps. I suspect as you dig deeper, you'll find many of your questions on prodigy train are really just spaCy questions.

Prodigy has a very helpful data-to-spacy recipe that can take a Prodigy dataset, create spaCy binary data files (including partitioning), and an initial spaCy config file. Then you can run spacy train on it. This can become even more powerful when you build your workflow as a spacy project. For examples, you can see the spaCy projects repo including this one that shows a Prodigy integrated spaCy project. As an example, it shows both training routes: prodigy train and using data-to-spacy -> spacy train.

Hope this helps!