Commands for training NER-Model in prodigy

mitro02 · January 24, 2022, 11:15am

Hi all,

I am using prodi.gy for almost two years and am very content with this tool and the developments! I had created a standard workflow for my usage purposes. After one year, I unfortunately cannot replicate this workflow, so I am asking for help.

First of all, I created an annotated dataset. The first step was successful:

(text) C:\Users\MyName>python -m prodigy ner.manual annotated_entities18 blank:de datapath/file.csv --label IDENTIFICATION
Using 1 label(s): IDENTIFICATION

Starting the web server at http://0.0.0.0:8080 ...
Open the app in your browser and start annotating!

Saved 315 annotations to database SQLite
Dataset: annotated_entities18
Session ID: 2022-01-18_13-39-53

Then, I'd like to train a NER-Model from scratch. I usually used this command:

(text) C:\Users\MyName>python -m prodigy train ner annotated_entities18 blank:de --output NER_18
Using CPU

✘ Invalid config override 'annotated_entities18': name should start with --

Unfortunately, this doesn't work anymore. I have also used tried some modifications since I have read the documentations again... but I want to replicate the old command 100%. The NER-Model should use the blank:de model!

I would appreciate any help

ines · January 24, 2022, 11:46am

Hi! I think the problem here is that the usage of the train command has changed slightly in v1.11 to support training multiple components at the same time (e.g. an NER model and text classifier together) and to integrate with spaCy v3.

You can see the new command usage and available arguments here: Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

So your training command could now look like this:

python -m prodigy train ./NER_18 --ner annotated_entities18 --lang de

mitro02 · January 24, 2022, 12:21pm

Thanks Ines for your quick reply! Is the trained NER model equivalent to the NER model of the previous command? I have really just used the spacy blank:de model, no tokenizer or whatever... Am a bit confused about this, but If it's the same command, then I will continue to use it this way

Best regards

ines · January 24, 2022, 12:26pm

Yes, the basic setup will be the same – setting --lang de is equivalent to starting out with the blank:de language which just includes the default German tokenizer and no components.

That said, the model you train with the latest Prodigy and spaCy v3 won't be compatible with spaCy v2.

mitro02 · January 24, 2022, 4:51pm

Thanks a lot. One last question: I'd like to load my own NER-model (which I have trained on the previous iteration). I'm again confused how the new commands are working...

ines · January 25, 2022, 12:07pm

In that case, you can just use the --base-model argument, e.g. --base-model /path/to/your/model. In general, we'd recommend retraining a model on all annotations from scratch, rather than updating a previously trained artifact, since this will give you better and more reliable results. But the base model setting can still be useful if you want to update one of the trained pipelines provided by spaCy etc.

mitro02 · January 25, 2022, 2:29pm

Thank you very much Ines. This was my plan

ysz · January 7, 2023, 7:34pm

A follow up, where is --init-tok2vec option now?

prodigy train ./tmp_model --ner qqs_s2v_data --lang en_core_web_lg --init-tok2vec ./model0.bin --eval-split 0.2

Gets

 No such option: --init-tok2vec

I'm basically following your video about NER @ines

ysz · January 7, 2023, 7:44pm

Actually, specifying en_core_web_lg does not work either the command that at least run is this

prodigy train ./tmp_model --ner qqs_s2v_data --lang en --eval-split 0.2

but how to start with en_core_web_lg and pretrained vectors then?

koaning · January 9, 2023, 9:56am

Hi there!

A few details about the train command.

The --lang parameter is meant to select a tokeniser when no base model or config are passed. I think in your case, you'd want to use the --base-model parameter to pass en_core_web_lg along.
The --init-tok2vec is something you can set from the command line, but it does need to be a parameter that's available from your configuration file. If you don't configure a custom file yourself, Prodigy will assume a config file like the one generated here. After filling in the missing params via spacy init fill-config I do see a init_tok2vec parameter. This is making me wonder if this might be one of those scenarios where it's really case sensitive. Could you try --init_tok2vec instead of --init-tok2vec?

Just to double-check. Are you using a custom config here? If so, could you share your config.cfg that you're using?

Topic		Replies	Views
Further train NER model from existing Model usage , ner , solved , training	1	588	January 25, 2022
pretrained tok2vec weights - prodigy v 1.11 bug , ner , spacy	5	737	October 21, 2021
Created new model with Ner.manual, but train only outputs 0 scores usage , ner , training	9	462	September 7, 2023
Updating an NER model using the annotation tool ner , spacy	6	397	June 5, 2023
questions on Multi NERs Annotation & Training at Once in a Sentence usage , ner , spacy	5	615	October 3, 2022

Commands for training NER-Model in prodigy

Related topics