Hi, I'm using both prodigy train
and spacy train
to train a textcat model, but my issue is to do with "loggers" in spacy.
In spacy I'm able to successfully use the handy Mlflow integration provided by the spacy-loggers
package. Unrelated, but in case anyone else tries to get this working I just had to make sure I set an output path to save the trained model and make sure my config.cfg
includes the loggers block with mlflow settings like below (ref: github:
[training.logger]
@loggers = "spacy.MLflowLogger.v1"
...
This seems to work well with the spacy train
command. However sometimes it is convenient to train my models straight from the prodigy database itself using prodigy train
just to avoid the prodigy data-to-spacy
process of exporting the dataset to .spacy
format etc. It seems I can do everything the same, I just make sure to specify the --config config.cfg
option to make sure prodigy uses the exact same settings. However prodigy is not logging anything to Mlflow.
Is this expected? I can't tell but maybe prodigy is overriding the [training.logger]
settings? In general, is it true that I should be able to use prodigy train
and spacy train
interchangeably as long as I use the same config.cfg
, or are there subtle differences I should keep in mind?
I see in spacy-loggers
that the MLflow integration is a recent addition but there is W&B integration that looks a little more mature. I haven't tried tracking to W&B, or to a custom logger or anything else, so I can't infer whether this is an MLflow specific, or if it affects [training.logger]
settings in general.
Maybe-relevant observation: Because I am using an mlflow remote server I need to make sure Mlflow is set up with its environment variables (MLFLOW_TRACKING_URI
etc) beforehand. If I forget to do this I find that spacy train
throws an mlflow related error message which is expected. However prodigy train
continues to train the model without any error (and nothing is logged to Mlflow). Does this suggest prodigy is skipping over my logger settings?
Thanks