Issues with ner.batch-train with en_trf_bertbaseuncased_lg after creating a custom set of labels

Hey all.

I started by creating a dataset using prodigy dataset ... and then using prodigy ner.manual ... with a bunch of my own labels I annotated a bunch of examples.

I was initially planning on using the BERT model: en_trf_bertbaseuncased_lg, but trying to run batch-train with:

prodigy ner.batch-train demo_v01 en_trf_bertbaseuncased_lg --output /tmp/model --eval-split 0.2 --dropout 0.2

I got the following error:

KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['sentencizer', 'ner']"

Is there some missing import in prodigy?

Hi! We currently do not have an NER model implementation using the transformer weights. See here for details:

So running a transformer model with ner.batch-train doesn't really make sense – you'd always be training a regular spaCy NER model (so you might as well use a blank en model). To use the transformer models with Prodigy likely also require slightly modified training recipes, since the updating works slightly differently in those cases (and has additional configuration options).