Pretraining support

claudio84destri · May 20, 2019, 4:30pm

Hi,

I pre-trained a model using spacy command line interface API and I got some weights saved as binary file.

I would like to know if I could use a code like the following to make a text categorizer that I could call in prodigy textcat.batch-train

nlp = spacy.blank("en")


textcat = nlp.create_pipe(
    "textcat",
    config={
        "exclusive_classes": True,
        "architecture": "simple_cnn",
    }
)

nlp.add_pipe(textcat, last=True)
    
    
path_weight = r"\model197.bin"

f = open(path_weight, 'rb')

optimizer = nlp.begin_training()

textcat.model.tok2vec.from_bytes(f.read())

output_dir = r"\model_path"

nlp.to_disk(output_dir)

thank you
kind regards

claudio nespoli

ines · May 20, 2019, 6:15pm

Yes, the upcoming version 1.8 of Prodigy will introduce support for spaCy v2.1 (see this thread for details), including an --init-tok2vec argument on the training commands that lets you pass in your pre-trained artifact. We’re just finishing up the last fixes and will hopefully have the version published later today!

claudio84destri · May 21, 2019, 9:52am

thank you very much, it will be nice

Topic		Replies	Views
Spacy pretrain best practices usage , done , spacy	16	5281	March 13, 2020
Use SpaCy textcat weights in a Prodigy TextClassifier textcat , solved	3	617	September 19, 2019
AttributeError when training textcat with pretrained weights textcat , spacy	1	470	August 10, 2020
Pretraining and exported textcat models usage , textcat , spacy	1	604	August 5, 2019
Do the outputted models using textcat.batch-train make use of word vectors? usage , textcat , spacy	2	595	March 28, 2019

Pretraining support

Related topics