Hi, I'm using both prodigy train and spacy train to train a textcat model, but my issue is to do with "loggers" in spacy.
In spacy I'm able to successfully use the handy Mlflow integration provided by the spacy-loggers package. Unrelated, but in case anyone else tries to get this working I just had to make sure I set an output path to save the trained model and make sure my config.cfg includes the loggers block with mlflow settings like below (ref: github:
[training.logger]
@loggers = "spacy.MLflowLogger.v1"
...
This seems to work well with the spacy train command. However sometimes it is convenient to train my models straight from the prodigy database itself using prodigy train just to avoid the prodigy data-to-spacy process of exporting the dataset to .spacy format etc. It seems I can do everything the same, I just make sure to specify the --config config.cfg option to make sure prodigy uses the exact same settings. However prodigy is not logging anything to Mlflow.
Is this expected? I can't tell but maybe prodigy is overriding the [training.logger] settings? In general, is it true that I should be able to use prodigy train and spacy train interchangeably as long as I use the same config.cfg, or are there subtle differences I should keep in mind?
I see in spacy-loggers that the MLflow integration is a recent addition but there is W&B integration that looks a little more mature. I haven't tried tracking to W&B, or to a custom logger or anything else, so I can't infer whether this is an MLflow specific, or if it affects [training.logger] settings in general.
Maybe-relevant observation: Because I am using an mlflow remote server I need to make sure Mlflow is set up with its environment variables (MLFLOW_TRACKING_URI etc) beforehand. If I forget to do this I find that spacy train throws an mlflow related error message which is expected. However prodigy train continues to train the model without any error (and nothing is logged to Mlflow). Does this suggest prodigy is skipping over my logger settings?
Thanks