Pretraining support

Hi,

I pre-trained a model using spacy command line interface API and I got some weights saved as binary file.

I would like to know if I could use a code like the following to make a text categorizer that I could call in prodigy textcat.batch-train

nlp = spacy.blank("en")


textcat = nlp.create_pipe(
    "textcat",
    config={
        "exclusive_classes": True,
        "architecture": "simple_cnn",
    }
)

nlp.add_pipe(textcat, last=True)
    
    
path_weight = r"\model197.bin"

f = open(path_weight, 'rb')

optimizer = nlp.begin_training()

textcat.model.tok2vec.from_bytes(f.read())

output_dir = r"\model_path"

nlp.to_disk(output_dir)

thank you
kind regards

claudio nespoli

Yes, the upcoming version 1.8 of Prodigy will introduce support for spaCy v2.1 (see this thread for details), including an --init-tok2vec argument on the training commands that lets you pass in your pre-trained artifact. We’re just finishing up the last fixes and will hopefully have the version published later today! :crossed_fingers:

thank you very much, it will be nice