Use SpaCy textcat weights in a Prodigy TextClassifier

Hi,

I'm saving a SpaCy model from a batch_train run and now I'd like to use that as the weights in a textcat.teach workflow.

If I initialize the TextClassifier with the SpaCy model with the textcat pipe will it use the previously trained weights or is there some way to make this happen?

OK, I think I might have the answer to my question (but would appreciate confirmation :slight_smile: ).

So as far as I can tell, when initializing the TextClassifier prodigy model, it will add sentencizer and textcat pipes to the spacy model if they don't exist.

If they do exist though, it looks like it just uses the ones that are present.

    textcat = nlp.get_pipe('textcat')
    textcat.foo = 'bar'

    model = TextClassifier(
        nlp,
        labels,
        long_text=long_text,
        low_data=len(examples) < 1000,
        init_tok2vec=init_tok2vec,
        exclusive_classes=exclusive,
    )

    assert model.nlp.get_pipe('textcat').foo == 'bar'

Is this right?

Thanks for updating and yes, that's correct. What Prodigy calls the TextClassifier class (and what could have probably been named better) is the "annotation model" that takes the nlp object, makes sure that everything is set up correctly and handles updating the model in the loop.

One small thing to note about pre-defined text classifiers is that spaCy currently doesn't support resizing an existing pre-trained text classifier, so you can't add more labels to it if the model's already pre-trained. So if you're using an existing text classifier, the labels in the data all need to be in the model already.

Thank for adding clarity and the bit about adding new labels.