We would like to programmatically pass the version value of the meta.json file. By default it looks like it is always 0.0.0
We can alway modify it post training, but it would be nice to pass it when calling train
hi @nvasil!
Thanks for your question.
You can't pass version (or related meta values) through the CLI. If you wanted to do it programmatically, you could create a script to read in the meta.json
and modify those values.
However, as mentioned above, another approach would be to use your meta.json
from training and overwrite the versioning with spacy package
.
Let's say you ran prodigy train
:
python -m prodigy train --ner train_dataset my-model
So my-model
would be a folder with model-best
and model-last
.
You can then run:
# assume you want to use model-best
python -m spacy package /my-model/model-best /my-new-model --version 0.0.1
Note there are other parameters that you can set in the meta.json
like --name
.
This will now create a new pipeline in my-new-model
, with the versioning so that you can load the model in spaCy as:
import spacy
nlp = spacy.load("/my-new-model/en_pipeline-0.0.1/en_pipeline/en_pipeline-0.0.1")
The pipeline
name is a default name used when you don't specify --name
in packaging.
You can find more details about this process in the spacy package
docs.
While prodigy train
is helpful because it's simple, if you're thinking more about advanced training, I would recommend that you consider learning more about spaCy pipelines including how to handle config files and developing pipelines. prodigy train
is abstracting away many of these steps. I suspect as you dig deeper, you'll find many of your questions on prodigy train
are really just spaCy questions.
Prodigy has a very helpful data-to-spacy
recipe that can take a Prodigy dataset, create spaCy binary data files (including partitioning), and an initial spaCy config file. Then you can run spacy train
on it. This can become even more powerful when you build your workflow as a spacy project
. For examples, you can see the spaCy projects repo including this one that shows a Prodigy integrated spaCy project. As an example, it shows both training routes: prodigy train
and using data-to-spacy
-> spacy train
.
Hope this helps!