Training a NER model with en_trf_robertabase_lg

edu · January 16, 2020, 8:46pm

Hi all, beginner user of Spacy here.

I'm trying out an NER task with transformers, specifically with the "en_trf_robertabase_lg" model. I have a few labeled data points and when I start training it throws the following error:

prodigy train ner demo_ner_news_headlines en_trf_robertabase_lg

Output:

✔ Loaded model 'en_trf_robertabase_lg'
Created and merged data for 46 total examples
Using 23 train / 23 eval (split 50%)
Component: ner | Batch size: compounding | Dropout: 0.2 | Iterations: 10
ℹ Baseline accuracy: 0.000

=========================== ✨  Training the model ===========================

#    Loss       Precision   Recall     F-Score 
--   --------   ---------   --------   --------
1:   0%|                                                                                                        | 0/23 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/prodigy/__main__.py", line 60, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 213, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/prodigy/recipes/train.py", line 159, in train
    nlp.update(docs, annots, drop=dropout, losses=losses)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/spacy_transformers/language.py", line 81, in update
    tok2vec = self.get_pipe(PIPES.tok2vec)
  File "/anaconda3/envs/nlp-flywheel/lib/python3.7/site-packages/spacy/language.py", line 281, in get_pipe
    raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['ner']"

Note training works fine with the "en_core_web_sm" model.

Any help is appreciated!

adriane · January 17, 2020, 10:27am

Hi! This isn't supported yet, see: Issues with ner.batch-train with en_trf_bertbaseuncased_lg after creating a custom set of labels

edu · January 17, 2020, 4:59pm

Oh I see... Just to be sure, because I'm getting the same error when I run

prodigy train textcat demo_news_topics en_trf_robertabase_lg

Error:

KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['textcat']"

Is training a simple textcat model with Transformers also not supported? Or did I install something incorrectly?

ines · January 19, 2020, 12:37pm

See my comment on this thread for more details on training a text classifier with transformer weights. TL;DR: You can do it, but you should probably use a separate training script, since the transformer models need pretty specific settings.

Topic		Replies	Views
Issues with ner.batch-train with en_trf_bertbaseuncased_lg after creating a custom set of labels enhancement , usage , ner , solved , transformers	1	1170	October 14, 2019
Training with base model en_core_web_trf throws error ner	8	456	April 3, 2024
Training new entity type with en_pytt_bertbaseuncased_lg model usage , ner , transformers	5	2031	August 30, 2019
Issue getting Tranformer-based NER pipeline working usage , spacy , transformers	3	1249	January 29, 2021
Add custom NER model from prodigy to spacy pipeline - spaCy V3 usage , ner , spacy	1	339	October 6, 2022

Training a NER model with en_trf_robertabase_lg

Related topics