strings.json can't be read for text cat

Hi,

I went ahead and used prodigy to train a text classification model and am now trying to test that on a sample phrase.

import spacy
from spacy.lang.en import English
from spacy.pipeline import TextCategorizer

nlp = English()
textcat = TextCategorizer(nlp.vocab)
textcat.from_disk("text_cat_model/textcat")
doc = nlp("I love cats")
processed = textcat(doc)

My problem is when I try to load the model output from prodigy I get the following error:

File "C:\Python37\lib\site-packages\srsly\util.py", line 21, in force_path
raise ValueError("Can't read file: {}".format(location))
ValueError: Can't read file: text_cat_model\textcat\vocab\strings.json

My file structure is as follows:

image

Wondering if someone could help me figure out what I am doing wrong?

Thanks.

Hi! The model directory saved out already has everything spaCy needs to load the model. So you don't have to create a text classifier from scratch, load from disk etc. If it's a model saved out via spaCy / Prodigy, you should be able to just pass the path to spacy.load:

nlp = spacy.load("text_cat_model")

And in order to process the text, there's no need to specifically call the textcat object on the Doc. That all happens under the hood when you process your text with the nlp object. So if you want the categories, you can do:

doc = nlp("I love cats")
print(doc.cats)

Wow I was way off...but you got me back on track!

Thanks @ines #solved

1 Like