Training a NER model with en_trf_robertabase_lg

Hi all, beginner user of Spacy here.

I'm trying out an NER task with transformers, specifically with the "en_trf_robertabase_lg" model. I have a few labeled data points and when I start training it throws the following error:

prodigy train ner demo_ner_news_headlines en_trf_robertabase_lg

Output:

✔ Loaded model 'en_trf_robertabase_lg'
Created and merged data for 46 total examples
Using 23 train / 23 eval (split 50%)
Component: ner | Batch size: compounding | Dropout: 0.2 | Iterations: 10
ℹ Baseline accuracy: 0.000

=========================== ✨  Training the model ===========================

#    Loss       Precision   Recall     F-Score 
--   --------   ---------   --------   --------
1:   0%|                                                                                                        | 0/23 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/prodigy/__main__.py", line 60, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 213, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/prodigy/recipes/train.py", line 159, in train
    nlp.update(docs, annots, drop=dropout, losses=losses)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/spacy_transformers/language.py", line 81, in update
    tok2vec = self.get_pipe(PIPES.tok2vec)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/spacy/language.py", line 281, in get_pipe
    raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['ner']"

Note training works fine with the "en_core_web_sm" model.

Any help is appreciated!

Hi! This isn't supported yet, see: Issues with ner.batch-train with en_trf_bertbaseuncased_lg after creating a custom set of labels

1 Like

Oh I see... Just to be sure, because I'm getting the same error when I run

prodigy train textcat demo_news_topics en_trf_robertabase_lg

Error:

KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['textcat']"

Is training a simple textcat model with Transformers also not supported? Or did I install something incorrectly?

See my comment on this thread for more details on training a text classifier with transformer weights. TL;DR: You can do it, but you should probably use a separate training script, since the transformer models need pretty specific settings.

1 Like