Hey everybody,
I'm training a model to categorize bank-turnovers with multiple labels.
I have a dataset already imported into prodigy
and can train the model successfully like this:
prodigy train models --textcat-multilabel bank_turnovers --lang "de"
The model works and I can see results when loading it an making predictions.
Now I tried to finetune it using textcat.teach
like this:
prodigy textcat.teach bank_turnovers_teach models/model-best ./assets/bank_turnovers_annotated.jsonl --label <all my labels>
This also worked and I went through roughly 400 examples.
Now I'm trying to integrate those examples into my existing model like this:
prodigy train --base-model models/model-best --textcat-multilabel bank_turnovers_teach
But this yields:
ℹ Using CPU
ℹ To switch to GPU 0, use the option: --gpu-id 0
========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
ℹ Using config from base model
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/tim/.pyenv/versions/3.12.1/lib/python3.12/site-packages/prodigy/__main__.py", line 50, in <module>
main()
File "/Users/tim/.pyenv/versions/3.12.1/lib/python3.12/site-packages/prodigy/__main__.py", line 44, in main
controller = run_recipe(run_args)
^^^^^^^^^^^^^^^^^^^^
File "cython_src/prodigy/cli.pyx", line 123, in prodigy.cli.run_recipe
File "cython_src/prodigy/cli.pyx", line 124, in prodigy.cli.run_recipe
File "/Users/tim/.pyenv/versions/3.12.1/lib/python3.12/site-packages/prodigy/recipes/train.py", line 291, in train
train_config = prodigy_config(
^^^^^^^^^^^^^^^
File "/Users/tim/.pyenv/versions/3.12.1/lib/python3.12/site-packages/prodigy/recipes/train.py", line 118, in prodigy_config
return _prodigy_config(
^^^^^^^^^^^^^^^^
File "/Users/tim/.pyenv/versions/3.12.1/lib/python3.12/site-packages/prodigy/recipes/train.py", line 145, in _prodigy_config
config = generate_config(config, base_nlp, base_model, list(pipes), silent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tim/.pyenv/versions/3.12.1/lib/python3.12/site-packages/prodigy/recipes/train.py", line 695, in generate_config
tok2vec = base_nlp.get_pipe("tok2vec")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tim/.pyenv/versions/3.12.1/lib/python3.12/site-packages/spacy/language.py", line 650, in get_pipe
raise KeyError(Errors.E001.format(name=name, opts=self.component_names))
KeyError: "[E001] No component 'tok2vec' found in pipeline. Available names: ['textcat_multilabel']"
I've looked at the config of the model-best
: There is truly no tok2vec
in there, but I don't know why it even tries to load it or what I am doing wrong.
Can somebody help?