Errors when importing a trained Prodigy textcat model into Spacy

I'm trying to use a textcat model I trained in Prodigy for inference in Spacy. And I'm a bit lost.

My setup:
Prodigy v1.9.9 on Python 3.8.2
Spacy v3.2.3 on Python 3.9.13

What I did:

  1. Ran prodigy textcat.teach and annotated a single label (920 examples total)
  2. Ran prodigy train textcat (output: 0.02 loss and 0.82 f-score) with -o ./model_textcat_mylabel

Then, I tried to load this in spacy:

import spacy # v3.2.3

nlp = spacy.load('/home/model_textcat_mylabel')
doc = nlp(u"This is a sentence.")
processed = nlp.predict(doc)
print(processed)

This gives me an error:

Could not read config file from /home/model_textcat_mylabel/config.cfg

Which makes since, since there is no "config.cfg" file in the folder that Prodigy created. Rather, the folder structure looks like this:

image

Another approach I tried was this:

import spacy # v3.2.3

nlp = spacy.blank("en")
textcat = nlp.add_pipe("textcat")
textcat.from_disk("/home/model_textcat_mylabel")

doc = nlp(u"This is a sentence.")

r = textcat.predict(doc)

print(r)

This gave me this error:

FileNotFoundError: [Errno 2] No such file or directory: '/home/model_textcat_mylabel/model'

Again, this makes sense, given the folder structure that I got from Prodigy.

And if I take the files that Prodigy saved in the subfolder textcat and move them up one level, the FileNotFoundError disappears. But then I get this error instead:

ValueError: Trying to read a Model that was created with an incompatible version of Thinc

I think I must be doing something terribly wrong.

I think this might be a version mismatch. When I check the release notes I see that Prodigy v1.11.0 introduced support for spaCy v3. You're using v1.9.9 so this issue may be fixed by upgrading Prodigy. The error message also makes sense because the config file was introduced with spaCy v3 as well and the Prodigy version before v1.11.0 is giving your a spaCy v2 model on disk.

Thank you, @koaning. This makes sense.

I ran it again using Prodigy 1.11.7. I'm having the same issue, though.

Workflow:

  1. On the old Prodigy, I ran prodigy db-out to export my labels.
  2. Switched to the new Prodigy (v. 1.11.7)
  3. Ran prodigy db-in to import the labels from step 1.
  4. Ran prodigy data-to-spacy.
  5. Ran spaCy (v3.23) spacy train and specified an output path

The resulting folder looks like this:

If I plug that path into textcat.from_disk, I get the error...

ValueError: Can't read file: /path/spacy_train_output/vocab/strings.json

If I instead use the path /path/spacy_train_output/model-best, I get the error...

FileNotFoundError: [Errno 2] No such file or directory: '/path/spacy_train_output/model-best/model'

If I move the model files from /path/spacy_train_output/model-best/textcat_multilabel/ up, so that it will be found when spaCy looks for it in /path/spacy_train_output/model-best/model, I get...

ValueError: Cannot deserialize model: mismatched structure

If I check the meta.json, it says

"spacy_version":">=3.2.3,<3.3.0,"

So this appears to be a spaCy 3 model. And I'm trying to load it in spaCy 3.2.3, which should be compatible. And yet, nothing seems to fit.

Ok, I got it fixed.

It was a silly mistake.

I simply needed to load the model using spacy.load() instead.

This code works beautifully:


import spacy

nlp = spacy.load("/path/spacy_train_output/model-best")

text = u"""
This is a test.
"""

doc = nlp(text)

r = doc.cats

print(r)

Sorry for the confusion!

1 Like