train textcat to train my classifiers. This works fine using a standard spaCy base model. But when I try with
en_trf_xlnetbasecased_lg I get an error:
File "/mnt/c/repo/prodigy/recipes/dacs.py", line 407, in train
result = train(**args)
File "/usr/local/lib/python3.7/site-packages/prodigy/recipes/train.py", line 154, in train
nlp.update(docs, annots, drop=dropout, losses=losses)
File "/usr/local/lib/python3.7/site-packages/spacy_transformers/language.py", line 81, in update
tok2vec = self.get_pipe(PIPES.tok2vec)
File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 281, in get_pipe
raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['textcat']"
I think this use case is supposed to work, right?
This is currently expected – the transformers classifier is a different text classifier implementation with its own component and component dependencies (token-vector-encoding, tokenization alignment etc.). The underlying problem here is that the
train recipe disables all components except for the one you train (which makes sense, because that's the only one you want to update). But that doesn't work for this component, since it has other component dependencies.
You can probably work around it by editing the recipe and the call to
nlp.disable_pipes. However, you're probably still better off using the standalone training script we provide in the
spacy-transformers repo. To get good results with the transformer models, you typically want to tune the hyperparameters and you probably also want to run it on GPU. Both of this is much easier if you have a standalone script.