NER training produces invalid config.cfg?

bob_ln · September 1, 2021, 4:28pm

Hi, not sure if this question goes here or spaCy support, so here goes.

Prodigy version: 1.11.2

Training command: prodigy train ner_20210830 --ner <task_name> --base-model en_core_web_trf --label-stats

The training went great, the model has an F-score around 0.8. When trying to load it with spacy.load, I get a config error due to the following configuration block in the auto-generated config.cfg file in model-best (offending line is the one setting grad_factor):

[components.tagger.model.tok2vec]
@architectures = "spacy-transformers.Tok2VecTransformer.v1"
name = "roberta-base"
pooling = {"@layers":"reduce_mean.v1"}
grad_factor = {"@layers":"reduce_mean.v1"}

Obviously this is some kind of error as Tok2VecTransformer.v1 expects a float for grad_factor, and the model won't load. I tried changing this by hand to 1.0, but that's generating the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/30/g5wyp_x905l0_06s76c87yx40000gp/T/ipykernel_75583/445984189.py in <module>
----> 1 nlp = spacy.load('ner_20210830/model-best/')

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/__init__.py in load(name, vocab, disable, exclude, config)
     49     RETURNS (Language): The loaded nlp object.
     50     """
---> 51     return util.load_model(
     52         name, vocab=vocab, disable=disable, exclude=exclude, config=config
     53     )

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/util.py in load_model(name, vocab, disable, exclude, config)
    321             return load_model_from_package(name, **kwargs)
    322         if Path(name).exists():  # path to model data directory
--> 323             return load_model_from_path(Path(name), **kwargs)
    324     elif hasattr(name, "exists"):  # Path or Path-like to model data
    325         return load_model_from_path(name, **kwargs)

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/util.py in load_model_from_path(model_path, meta, vocab, disable, exclude, config)
    388     config = load_config(config_path, overrides=overrides)
    389     nlp = load_model_from_config(config, vocab=vocab, disable=disable, exclude=exclude)
--> 390     return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
    391 
    392 

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/language.py in from_disk(self, path, exclude, overrides)
   1965             # Convert to list here in case exclude is (default) tuple
   1966             exclude = list(exclude) + ["vocab"]
-> 1967         util.from_disk(path, deserializers, exclude)
   1968         self._path = path
   1969         self._link_components()

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/util.py in from_disk(path, readers, exclude)
   1197         # Split to support file names like meta.json
   1198         if key.split(".")[0] not in exclude:
-> 1199             reader(path / key)
   1200     return path
   1201 

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/language.py in <lambda>(p, proc)
   1959             if not hasattr(proc, "from_disk"):
   1960                 continue
-> 1961             deserializers[name] = lambda p, proc=proc: proc.from_disk(
   1962                 p, exclude=["vocab"]
   1963             )

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/pipeline/trainable_pipe.pyx in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk()

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/util.py in from_disk(path, readers, exclude)
   1197         # Split to support file names like meta.json
   1198         if key.split(".")[0] not in exclude:
-> 1199             reader(path / key)
   1200     return path
   1201 

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/pipeline/trainable_pipe.pyx in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk.load_model()

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/spacy/pipeline/trainable_pipe.pyx in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk.load_model()

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/thinc/model.py in from_bytes(self, bytes_data)
    527         msg = srsly.msgpack_loads(bytes_data)
    528         msg = convert_recursive(is_xp_array, self.ops.asarray, msg)
--> 529         return self.from_dict(msg)
    530 
    531     def from_disk(self, path: Union[Path, str]) -> "Model":

~/dev/proj/spacy_ner/env/lib/python3.8/site-packages/thinc/model.py in from_dict(self, msg)
    544         nodes = list(self.walk())
    545         if len(msg["nodes"]) != len(nodes):
--> 546             raise ValueError("Cannot deserialize model: mismatched structure")
    547         for i, node in enumerate(nodes):
    548             info = msg["nodes"][i]

ValueError: Cannot deserialize model: mismatched structure

I'm mystified as to why Prodigy would have created an invalid config.cfg, and not sure how to fix this. I'd really like to use this model, especially as it took ~300 CPU hours to train up! Any help would be much appreciated. Thanks.

bob_ln · September 1, 2021, 6:14pm

And here's my pip freeze

aiobotocore 
aiofiles==0.7.0
aiohttp 
aioitertools 
aioredis==2.0.0
async-timeout==3.0.1
attrs 
autovizwidget 
awscli==1.20.27
backports.functools-lru-cache 
bcrypt==3.2.0
beautifulsoup4==4.9.3
bleach 
blessings==1.7
blis==0.7.4
boto3==1.18.27
botocore==1.21.27
brotlipy==0.7.0
cached-property==1.5.2
cachetools==4.2.2
catalogue==2.0.6
certifi==2021.5.30
cffi 
chardet 
charset-normalizer 
click==7.1.2
colorama==0.4.3
colorful==0.5.4
contextvars==2.4
cryptography 
cymem==2.0.5
dataclasses==0.8
decorator 
defusedxml 
dill==0.3.4
distro==1.6.0
docker==5.0.0
docker-compose==1.29.2
dockerpty==0.4.1
docopt==0.6.2
docutils==0.15.2
en-core-web-lg 
en-core-web-trf 
entrypoints 
environment-kernels==1.1.1
fastapi==0.68.1
filelock==3.0.12
fsspec 
gitdb==4.0.7
GitPython==3.1.18
google==3.0.0
google-api-core==1.31.2
google-auth==1.35.0
google-pasta==0.2.0
googleapis-common-protos==1.53.0
gpustat==0.6.0
grpcio==1.39.0
h11==0.12.0
hdijupyterutils 
huggingface-hub==0.0.12
idna 
idna-ssl 
immutables==0.16
importlib-metadata 
ipykernel 
ipython==5.8.0
ipython-genutils==0.2.0
ipywidgets 
Jinja2 
jmespath 
joblib==1.0.1
json5==0.9.6
jsonschema 
jupyter 
jupyter-client 
jupyter-console==5.2.0
jupyter-core 
jupyterlab==1.2.21
jupyterlab-git==0.11.0
jupyterlab-server==1.2.0
jupyterlab-widgets 
MarkupSafe 
mistune 
mock 
msgpack==1.0.2
multidict 
multiprocess==0.70.12.2
murmurhash==1.0.5
nb-conda 
nb-conda-kernels 
nbconvert==5.6.1
nbdime==1.1.0
nbexamples 
nbformat 
nbserverproxy 
nose 
notebook 
numpy 
nvidia-ml-py3==7.352.0
opencensus==0.7.13
opencensus-context==0.1.2
packaging 
pandas==0.22.0
pandocfilters==1.4.2
paramiko==2.7.2
pathos==0.2.8
pathy==0.6.0
peewee==3.14.4
pexpect 
pickleshare 
pid==3.0.4
plac==1.1.3
plotly 
pox==0.3.0
ppft==1.6.6.4
preshed==3.0.5
prodigy 
prometheus-client 
prompt-toolkit==1.0.15
protobuf==3.17.2
protobuf3-to-dict==0.1.5
psutil==5.8.0
psycopg2 
ptyprocess 
py-spy==0.3.8
py4j==0.10.7
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser 
pydantic==1.8.2
pygal==2.4.0
Pygments 
PyJWT==2.1.0
pykerberos 
PyNaCl==1.4.0
pyOpenSSL 
pyparsing==2.4.7
PyQt5==5.12.3
PyQt5_sip==4.19.18
PyQtChart==5.12
PyQtWebEngine==5.12.1
pyrsistent 
PySocks 
pyspark==2.4.0
python-dateutil 
python-dotenv==0.19.0
pytz 
PyYAML==5.4.1
pyzmq 
qtconsole 
QtPy 
ray==0.8.7
redis==3.4.1
regex==2021.8.28
requests 
requests-kerberos 
rsa==4.7.2
s3fs 
s3transfer==0.5.0
sacremoses==0.0.45
sagemaker==2.54.0
sagemaker-experiments==0.1.35
sagemaker-nbi-agent 
sagemaker-pyspark==1.4.2
Send2Trash 
simplegeneric==0.8.1
six 
smart-open==5.2.1
smdebug-rulesconfig==1.0.1
smmap==4.0.0
soupsieve==2.2.1
spacy==3.1.2
spacy-alignments==0.8.3
spacy-legacy==3.0.8
spacy-ray==0.1.4
spacy-transformers==1.0.5
sparkmagic 
srsly==2.4.1
starlette==0.14.2
tenacity 
terminado 
testpath 
texttable==1.6.4
thinc==8.0.8
tokenizers==0.10.3
toolz==0.11.1
torch==1.9.0
tornado 
tqdm==4.62.2
traitlets==4.3.3
transformers==4.9.2
typer==0.3.2
typing-extensions 
urllib3==1.26.6
uvicorn==0.13.4
uvloop==0.14.0
wasabi==0.8.2
wcwidth 
webencodings==0.5.1
websocket-client==0.59.0
widgetsnbextension 
wrapt 
yarl 
zipp

ines · September 2, 2021, 12:34pm

Hi! Thanks for the report and sorry about that – it turned out this was actually a bug in spacy-transformers, which we just fixed in v1.0.6 (should be on PyPi in a couple of minutes).

The good news is, there should be a relatively easy workaround: first, you can upgrade spacy-transformers to v1.0.6. And instead of loading the tagger from the saved model with the broken config, you can then just re-source the component in the config and repeat it for any other components where this error occurs:

[components.tagger]
source = "en_core_web_trf"
replace_listeners = ["model.tok2vec"]

This should hopefully fix the problem for you model

bob_ln · September 2, 2021, 1:41pm

Thanks for the response! I'm afraid the workaround isn't working, I'm still getting the "mismatched structure" error. It's odd, debugging the config file passes all validations.


============================= Config validation =============================
MSG: 7
NODES: 7
MSG names: ['reduce_mean', 'softmax', 'transformer-listener', 'transformer-listener>>trfs2arrays', 'transformer-listener>>trfs2arrays>>with_array(softmax)', 'trfs2arrays', 'with_array(softmax)']
NODE names: ['reduce_mean', 'softmax', 'transformer-listener', 'transformer-listener>>trfs2arrays', 'transformer-listener>>trfs2arrays>>with_array(softmax)', 'trfs2arrays', 'with_array(softmax)']
MSG: 10
NODES: 10
MSG names: ['linear', 'list2array', 'noop', 'parser_model', 'precomputable_affine', 'reduce_mean', 'transformer-listener', 'transformer-listener>>trfs2arrays', 'transformer-listener>>trfs2arrays>>list2array>>linear', 'trfs2arrays']
NODE names: ['linear', 'list2array', 'noop', 'parser_model', 'precomputable_affine', 'reduce_mean', 'transformer-listener', 'transformer-listener>>trfs2arrays', 'transformer-listener>>trfs2arrays>>list2array>>linear', 'trfs2arrays']
MSG: 10
NODES: 10
MSG names: ['linear', 'list2array', 'noop', 'parser_model', 'precomputable_affine', 'reduce_mean', 'transformer-listener', 'transformer-listener>>trfs2arrays', 'transformer-listener>>trfs2arrays>>list2array>>linear', 'trfs2arrays']
NODE names: ['linear', 'list2array', 'noop', 'parser_model', 'precomputable_affine', 'reduce_mean', 'transformer-listener', 'transformer-listener>>trfs2arrays', 'transformer-listener>>trfs2arrays>>list2array>>linear', 'trfs2arrays']

===================== Config validation for [initialize] =====================

====================== Config validation for [training] ======================
✔ Config is valid

bob_ln · September 2, 2021, 2:12pm

Giving this a try...

bob_ln · September 2, 2021, 4:57pm

Okay, I've taken a dive into the spaCy and thinc libraries and can't pin down where the error in mismatched configurations is coming from. More distressingly, I've trained a model (after updating spacy-transformers) using the same command and a different dataset, and that one won't load either - same error.

ValueError: Cannot deserialize model: mismatched structure

SofieVL · September 2, 2021, 9:27pm

Hi Bob,

Thanks for your patience and retrying - it seems like there are in fact 2 different bugs at play here. Fortunately, I do have some good news:

You shouldn't have lost the training hours. The loading of the model goes wrong for the frozen components in your pipeline - i.e. those that you didn't retrain. This is the workaround:

nlp = spacy.load(".../ner_20210830/model-best/", exclude="tagger,parser")
nlp_orig = spacy.load("en_core_web_trf")
nlp.add_pipe("parser", source=nlp_orig, after="transformer")
nlp.add_pipe("tagger", source=nlp_orig, after="parser")
doc = nlp("This should just run.")

(if you really only need the NER, you can just exclude all the other pipes, that's even easier)

Hopefully that at least makes sure you can keep working while we fix the (second) issue on our end!

bob_ln · September 2, 2021, 10:12pm

This worked! Thank you!

latkins · September 3, 2021, 3:30pm

Is there a similar work-around for a trained model I want to use with the ner.correct recipe?

SofieVL · October 19, 2021, 4:22pm

Hi Liam,

Apologies for the late follow-up. I'm not clear on what exactly you're asking: could you describe the issue you're facing in more detail, as well as what you've tried to resolve it using the advice I noted above? Then I can help identify whether there's still a gap for your use-case!

ingvarvg · March 14, 2023, 11:15pm

Also a problem with ner.correct

ValueError: Cannot deserialize model: mismatched structure

Using Prodigy 1.11.11
spacy==3.5.0
spacy-alignments==0.9.0
spacy-legacy==3.0.12
spacy-loggers==1.0.4
spacy-transformers==1.2.2

koaning · March 15, 2023, 2:34pm

Hi Igor,

could you share some details about your setup? What kind of model are you using in ner.correct? Could you share the full command you ran?

ingvarvg · March 15, 2023, 10:31pm

Hi @koaning,

Sure, sorry for redundancy I've also described it here mismatched structure when using tranformers model to train textcat (en_core_web_trf) - #11 by ingvarvg

I actually cannot load the trained model in plain spaCy. Supposedly the same reason with prodigy ner.correct.

koaning · March 21, 2023, 9:02am

@ingvarvg I am replying to the thread that you just posted, I would prefer to that we keep the discussion to a single thread though, mainly to ensure that there is a single source of discussion.

Topic		Replies	Views
Correcting trained model fails. ner , solved , transformers , training	4	844	January 24, 2022
[E896] on training existing model (NER) usage , ner	1	297	October 10, 2023
mismatched structure when loading ner tranformers model (en_core_web_trf) usage , ner , transformers	1	495	October 7, 2022
pretrained tok2vec weights - prodigy v 1.11 bug , ner , spacy	5	735	October 21, 2021
Error loading transformer model into ner.correct usage , ner , spacy , solved	2	1166	March 25, 2022

NER training produces invalid config.cfg?

Related topics