I have an issue since the version v1.11.12. As is stated in the doc, some bug was fixed around the --base-model usage. When I try to use a base model for NER on a simple dataset (I'm using fr_dep_news_trf) prodigy returns the following error :
KeyError: "[E001] No component 'tok2vec' found in pipeline. Available names: ['transformer', 'morphologizer', 'parser', 'attribute_ruler', 'lemmatizer']"
Which is... normal actually, since the fr_dep_news_trf model does not have any tok2vec component ! It seems like the prodigy train recipe assumes that any model used for training has a tok2vec component, even though it seems that the new spacy-transformers allows a new norm, using a transformer component directly instead of a tok2vec.
We're aware of that error. There's an issue with transformers that's unrelated to the bug fixed in v1.11.12. It's also a bug that's more upstream, caused by an issue in the spaCy codebase. The team is aware of it though and is currently working on a fix.
Another thread on this issue can be found here:
That thread also has a temporary workaround that involves writing a custom config.cfg file. It's not a proper solution, but might serve as a remedy for now.
Thank you for the quick response ! I tried to find previous threads but didn't find this one. I will look into their temporary solution then, and look forward to the team fixing this. I guess I can delete this topic then ?
It can't hurt to leave open now, since a link to the other example exists.
Now that I think of it .... it might even be better to leave the topic open because, as you say, you weren't able to find the thread yourself. If we keep this open, maybe Google/Discourse will have an easier time indexing more appropriate keywords that eventually lead to the right thread.
Dear koaning, thank you for providing the link.
I want to use the en_core_web_trf and train span cat component. I have created the config file where I deleted all lines where "tok2vec" was present. However, I still get the error
"[E001] No component 'tok2vec' found in pipeline. Available names: ['transformer', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']"
What I ended up doing after some investigation, was to remove the transformer step in the pipeline entirely, to opt for a tok2vec declaration inside my ner step ! Here is my config file for reference :
I even left the grad_factor at 1.0, given that I had enough examples to finetune my model... But I guess this is mostly spacy-fu and not especially related to prodigy !
Dear Martin,
thank you so much for your time and your answer.
I tried to use your config file, but still keep getting the same eror
KeyError: "[E001] No component 'tok2vec' found in pipeline. Available names: ['transformer', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']"
This might be because you are still using the --base-model parameter, while I don't ! I should have added this, but the only place a model is specified in my case is there :
[components.ner.model.tok2vec]
@architectures = "spacy-transformers.Tok2VecTransformer.v3"
name = "camembert-base" # this here is the huggingface model to use
grad_factor = 1.0
mixed_precision = false
pooling = {"@layers":"reduce_mean.v1"}
inside the configuration file you specified ! I am not an expert but I think you can specify any kind of spacy model you have at your disposal ! I do not use a base model anymore, and that was what solved my problem.
Thanks for raising the issue and I certainly understand your concern.
We have started to work on it but for a number of reasons this fix got deprioritized. I'll make sure it's back on the agenda for the fortcoming sprint.
Again, apologies for the delay in addressing it.
I'd like to update that the bug that prevented the use of transformer spaCy pipelines as base models is fixed in the recently released Prodigy 1.15.1 (changelog)
Again, apologies for the delay in addressing this issue.