Trouble loading `en_core_web_lg` as base model

Hello, I am trying to train a model using the en_core_web_lg model as the --base-model.

Here is the code I am using:

prodigy train new_model --textcat-multilabel annotated_tweets --eval-split 0.2 --base-model en_core_web_lg

I am using an Anaconda environment set up with the following commands:

conda create -n prodigy python=3.10.4
conda activate test_env

pip install prodigy -f https://XXX-XXX-XXXX-XXX@download.prodi.gy

#I have tried both of the following to download the model (both seem to give the same error)
python -m spacy download en_core_web_lg
conda install -c conda-forge spacy-model-en_core_web_lg 

This is the error that I get:

ℹ Using CPU

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
ℹ Using config from base model
✔ Generated training config

=========================== Initializing pipeline ===========================
✘ Config validation error
Bad value substitution: option 'width' in section 'components.textcat_multilabel.model.tok2vec' contains an interpolation key 'components.tok2vec.model.encode.width' which is not a valid option name. Raw value: '${components.tok2vec.model.encode.width}'

I couldn't find much on the internet about this error, so I thought I'd ask if you have any ideas as to how to solve this. Any thoughts appreciated - thanks!

Hello @cbjrobertson,

thank you for your question.
It seems that you have a similar problem as described here: Unable to train textcat model using en_core_web_md as a base model
The solution my colleague Vincent found was to use a spacy-config file as workaround, see: Unable to train textcat model using en_core_web_md as a base model - #4 by koaning
For your problem, you would have to change the model from en_core_web_md to en_core_web_lg.

Can you try this out? Maybe this works for you too.

I am working on a fix for this issue and am using this thread as my main method of communication:

https://support.prodi.gy/t/unable-to-train-textcat-model-using-en-core-web-md-as-a-base-model/5953/9?u=koaning

Just wanted to mention it here as well that we now have a fix for this issue. Details can be found on our changelog!