prodigy train does not appear to support spacy-loggers

Hi, I'm using both prodigy train and spacy train to train a textcat model, but my issue is to do with "loggers" in spacy.

In spacy I'm able to successfully use the handy Mlflow integration provided by the spacy-loggers package. Unrelated, but in case anyone else tries to get this working I just had to make sure I set an output path to save the trained model and make sure my config.cfg includes the loggers block with mlflow settings like below (ref: github:

@loggers = "spacy.MLflowLogger.v1"

This seems to work well with the spacy train command. However sometimes it is convenient to train my models straight from the prodigy database itself using prodigy train just to avoid the prodigy data-to-spacy process of exporting the dataset to .spacy format etc. It seems I can do everything the same, I just make sure to specify the --config config.cfg option to make sure prodigy uses the exact same settings. However prodigy is not logging anything to Mlflow.

Is this expected? I can't tell but maybe prodigy is overriding the [training.logger] settings? In general, is it true that I should be able to use prodigy train and spacy train interchangeably as long as I use the same config.cfg, or are there subtle differences I should keep in mind?

I see in spacy-loggers that the MLflow integration is a recent addition but there is W&B integration that looks a little more mature. I haven't tried tracking to W&B, or to a custom logger or anything else, so I can't infer whether this is an MLflow specific, or if it affects [training.logger] settings in general.

Maybe-relevant observation: Because I am using an mlflow remote server I need to make sure Mlflow is set up with its environment variables (MLFLOW_TRACKING_URI etc) beforehand. If I forget to do this I find that spacy train throws an mlflow related error message which is expected. However prodigy train continues to train the model without any error (and nothing is logged to Mlflow). Does this suggest prodigy is skipping over my logger settings?


hi @adin786!

Thanks for your post!

I've written a note internally about this issue. I haven't done much with spacy-loggers or MLflow so it's hard for me to diagnose but I'll get back to you if I'm able to provide more details.

Hi Azam!

Your observation is correct, the prodigy train recipe overrides your logger settings in the configs. The MLFlow logger is a recent addition and while I can't rule out bugs in there completely, what you are describing sounds like the expected behavior for prodigy train. We need to log some data specific to Prodigy, hence the override.

If running multiple steps is a hassle for you, I'd recommend automating this by setting up a spaCy project file so you can run multiple steps at once (e.g. prodigy data-to-spacy followed by spacy train).

In general - with the exception of the overriding of the logger - I can't think of any "subtle differences" between prodigy train and spacy train right now.