Training with base model en_core_web_trf throws error

Training an NER model using the transformer model as base model throws error (see below):

python -m prodigy train ./output_dir --ner ner_ticker --base-model en_core_web_trf

Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Program Files\Python\Lib\site-packages\prodigy_main
.py", line 50, in
main()
File "C:\Program Files\Python\Lib\site-packages\prodigy_main
.py", line 44, in main
controller = run_recipe(run_args)
^^^^^^^^^^^^^^^^^^^^
File "cython_src\prodigy\cli.pyx", line 117, in prodigy.cli.run_recipe
File "cython_src\prodigy\cli.pyx", line 118, in prodigy.cli.run_recipe
File "C:\Program Files\Python\Lib\site-packages\prodigy\recipes\train.py", line 291, in train
train_config = prodigy_config(
^^^^^^^^^^^^^^^
File "C:\Program Files\Python\Lib\site-packages\prodigy\recipes\train.py", line 118, in prodigy_config
return _prodigy_config(
^^^^^^^^^^^^^^^^
File "C:\Program Files\Python\Lib\site-packages\prodigy\recipes\train.py", line 145, in _prodigy_config
config = generate_config(config, base_nlp, base_model, list(pipes), silent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python\Lib\site-packages\prodigy\recipes\train.py", line 695, in generate_config
tok2vec = base_nlp.get_pipe("tok2vec")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python\Lib\site-packages\spacy\language.py", line 650, in get_pipe
raise KeyError(Errors.E001.format(name=name, opts=self.component_names))
KeyError: "[E001] No component 'tok2vec' found in pipeline. Available names: ['transformer', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']"

My spaCy info:

============================== Info about spaCy ==============================

spaCy version 3.7.2
Location C:\Program Files\Python\Lib\site-packages\spacy
Platform Windows-10-10.0.22621-SP0
Python version 3.11.6
Pipelines en_core_web_lg (3.7.0), en_core_web_md (3.7.0), en_core_web_sm (3.7.0), en_core_web_trf (3.7.2)

Thanks for your help.

Also, forgot to print my Prodigy stats:

============================== :sparkles: Prodigy Stats ==============================

Version 1.14.6
Location C:\Program Files\Python\Lib\site-packages\prodigy
Prodigy Home C:\Users\Ronny.prodigy
Platform Windows-10-10.0.22621-SP0
Python Version 3.11.6
Spacy Version 3.7.2
Database Name SQLite
Database Id sqlite
Total Datasets 1
Total Sessions 31

Thanks.

hi @ronnysh!

This seems to be similar to this issue. Can you see if this solution works?

Hi @ryanwesslen ,

As I am using Prodigy for training I don't manually create a config file. Is there a way to bypass the tok2vec component when training with Prodigy using the transformer model?

Thanks.

We're working on a fix but as of now, the only option is the workaround in that post using a config file. I'll post back when we have an update on our progress.