I'm trying to use a textcat model I trained in Prodigy for inference in Spacy. And I'm a bit lost.
My setup:
Prodigy v1.9.9 on Python 3.8.2
Spacy v3.2.3 on Python 3.9.13
What I did:
Ran prodigy textcat.teach and annotated a single label (920 examples total)
Ran prodigy train textcat (output: 0.02 loss and 0.82 f-score) with -o ./model_textcat_mylabel
Then, I tried to load this in spacy:
import spacy # v3.2.3
nlp = spacy.load('/home/model_textcat_mylabel')
doc = nlp(u"This is a sentence.")
processed = nlp.predict(doc)
print(processed)
This gives me an error:
Could not read config file from /home/model_textcat_mylabel/config.cfg
Which makes since, since there is no "config.cfg" file in the folder that Prodigy created. Rather, the folder structure looks like this:
Another approach I tried was this:
import spacy # v3.2.3
nlp = spacy.blank("en")
textcat = nlp.add_pipe("textcat")
textcat.from_disk("/home/model_textcat_mylabel")
doc = nlp(u"This is a sentence.")
r = textcat.predict(doc)
print(r)
This gave me this error:
FileNotFoundError: [Errno 2] No such file or directory: '/home/model_textcat_mylabel/model'
Again, this makes sense, given the folder structure that I got from Prodigy.
And if I take the files that Prodigy saved in the subfolder textcat and move them up one level, the FileNotFoundError disappears. But then I get this error instead:
ValueError: Trying to read a Model that was created with an incompatible version of Thinc
I think this might be a version mismatch. When I check the release notes I see that Prodigy v1.11.0 introduced support for spaCy v3. You're using v1.9.9 so this issue may be fixed by upgrading Prodigy. The error message also makes sense because the config file was introduced with spaCy v3 as well and the Prodigy version before v1.11.0 is giving your a spaCy v2 model on disk.
If I instead use the path /path/spacy_train_output/model-best, I get the error...
FileNotFoundError: [Errno 2] No such file or directory: '/path/spacy_train_output/model-best/model'
If I move the model files from /path/spacy_train_output/model-best/textcat_multilabel/ up, so that it will be found when spaCy looks for it in /path/spacy_train_output/model-best/model, I get...