Commands for training NER-Model in prodigy

Hi all,

I am using prodi.gy for almost two years and am very content with this tool and the developments! I had created a standard workflow for my usage purposes. After one year, I unfortunately cannot replicate this workflow, so I am asking for help.

First of all, I created an annotated dataset. The first step was successful:

(text) C:\Users\MyName>python -m prodigy ner.manual annotated_entities18 blank:de datapath/file.csv --label IDENTIFICATION
Using 1 label(s): IDENTIFICATION

:sparkles: Starting the web server at http://0.0.0.0:8080 ...
Open the app in your browser and start annotating!

:heavy_check_mark: Saved 315 annotations to database SQLite
Dataset: annotated_entities18
Session ID: 2022-01-18_13-39-53

Then, I'd like to train a NER-Model from scratch. I usually used this command:

(text) C:\Users\MyName>python -m prodigy train ner annotated_entities18 blank:de --output NER_18
:information_source: Using CPU

✘ Invalid config override 'annotated_entities18': name should start with --

Unfortunately, this doesn't work anymore. I have also used tried some modifications since I have read the documentations again... but I want to replicate the old command 100%. The NER-Model should use the blank:de model!

I would appreciate any help :blush:

Hi! I think the problem here is that the usage of the train command has changed slightly in v1.11 to support training multiple components at the same time (e.g. an NER model and text classifier together) and to integrate with spaCy v3.

You can see the new command usage and available arguments here: https://prodi.gy/docs/recipes#train

So your training command could now look like this:

python -m prodigy train ./NER_18 --ner annotated_entities18 --lang de

Thanks Ines for your quick reply! Is the trained NER model equivalent to the NER model of the previous command? I have really just used the spacy blank:de model, no tokenizer or whatever... Am a bit confused about this, but If it's the same command, then I will continue to use it this way :blush:

Best regards

Yes, the basic setup will be the same – setting --lang de is equivalent to starting out with the blank:de language which just includes the default German tokenizer and no components.

That said, the model you train with the latest Prodigy and spaCy v3 won't be compatible with spaCy v2.

1 Like

Thanks a lot. One last question: I'd like to load my own NER-model (which I have trained on the previous iteration). I'm again confused how the new commands are working...

In that case, you can just use the --base-model argument, e.g. --base-model /path/to/your/model. In general, we'd recommend retraining a model on all annotations from scratch, rather than updating a previously trained artifact, since this will give you better and more reliable results. But the base model setting can still be useful if you want to update one of the trained pipelines provided by spaCy etc.

1 Like

Thank you very much Ines. This was my plan :blush: