I’m trying to figure out how to best save on space. We want to do periodic batch-training and persist the models to s3. I notice that when I run textcat.batch_train, it creates a new textcat spacy model and adds it to the text cat directory in the model next to the vocabulary directory that comes with the en_vectors_web_lg model. Since the vocabulary isn’t being changed, it would be nice to be able to omit the vocabulary from the persisted model and then add it back in when doing textcat.teach.
Is there a good way to mix and match the model sub-components like this?