Error using XLNet for text classification: No component 'trf_tok2vec' found in pipeline

edward · January 12, 2020, 1:13pm

Hi.

I'm using train textcat to train my classifiers. This works fine using a standard spaCy base model. But when I try with en_trf_xlnetbasecased_lg I get an error:

  File "/mnt/c/repo/prodigy/recipes/dacs.py", line 407, in train
    result = train(**args)
  File "/usr/local/lib/python3.7/site-packages/prodigy/recipes/train.py", line 154, in train
    nlp.update(docs, annots, drop=dropout, losses=losses)
  File "/usr/local/lib/python3.7/site-packages/spacy_transformers/language.py", line 81, in update
    tok2vec = self.get_pipe(PIPES.tok2vec)
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 281, in get_pipe
    raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['textcat']"

I think this use case is supposed to work, right?

Prodigy 1.9.4

ines · January 12, 2020, 1:59pm

This is currently expected – the transformers classifier is a different text classifier implementation with its own component and component dependencies (token-vector-encoding, tokenization alignment etc.). The underlying problem here is that the train recipe disables all components except for the one you train (which makes sense, because that's the only one you want to update). But that doesn't work for this component, since it has other component dependencies.

You can probably work around it by editing the recipe and the call to nlp.disable_pipes. However, you're probably still better off using the standalone training script we provide in the spacy-transformers repo. To get good results with the transformer models, you typically want to tune the hyperparameters and you probably also want to run it on GPU. Both of this is much easier if you have a standalone script.

Topic		Replies	Views
Base model without tok2vec throws error spacy	11	1087	February 23, 2024
Issues with ner.batch-train with en_trf_bertbaseuncased_lg after creating a custom set of labels enhancement , usage , ner , solved , transformers	1	1169	October 14, 2019
No component 'tok2vec' error when trying to improve a textcat multilabel model bug , textcat , solved , training	3	35	July 30, 2024
Unable to train textcat model using en_core_web_md as a base model textcat	11	1681	May 2, 2023
mismatched structure when using tranformers model to train textcat (en_core_web_trf) textcat , spacy , transformers	16	1342	March 29, 2023

Error using XLNet for text classification: No component 'trf_tok2vec' found in pipeline

Related topics