Pretraining and exported textcat models

The new spacy pretrain feature is great. However, it's a little unclear from the documentation (or maybe i'm misreading), when I run spacy textcat.batch-train with a t2v option and I tell it to output the model to disk, is it going to package the language model in with that classifier, or will I have to ensure the language model is loaded in manually next time?

The -t2v command changes how the textcat model is initialised. So, the textcat.batch-train command will load in those pretraining weights, and then start modifying them. The modified weights will be saved out in the model.

So the short answer is no, you won't have to do anything to load the weights back in --- what you'll load back in are the corrected, fine-tuned weights for your task.

On the other hand you also won't have access to those original language model weights. Those will be gone --- so if you want to do more training later of a new model, you'll have to have access to the file.