ner.batch-train not to use default labels but just the ones from a training sample

Hi,

I run custom ner.make-gold recipe with number of different labels and obtain a training dataset. Now I am trying to run ner.bach-train to train the model using my dataset as below:

prodigy ner.batch-train <my_set> en_core_web_lg --output <my_new_model> --eval-split 0.5 --label LABEL1,LABEL2

Once I have my model and I try to run it in the prediction mode it is still using the build in default labels. Is there a way to stop it from using the build in labels and make it use only the custom ones from the training set?

Yes, in this case, you’re only updating the existing en_core_web_lg model. If you want the model to learn only about your new categories from scratch, you need to start off with a blank entity recognizer. This may also give you better accuracy results, since your new annotations won’t have to “compete” with the other categories that were likely trained on significantly more data.

The easiest way to create a base model without the old entity recognizer weights is to save out the model you want to use and disable the 'ner' component:

nlp = spacy.load('en_core_web_lg')
nlp.to_disk('/path/to/base-model', disable=['ner'])

Or, if you want it even simpler, here’s a handy one-liner:

python -c "import spacy; spacy.load('en_core_web_lg').to_disk('/path/to/base-model', disable=['ner'])"

You can then load the new base model into ner.batch-train:

prodigy ner.batch-train <my_set> /path/to/base-model --output <my_new_model> --eval-split 0.5 --label LABEL1,LABEL2

Awesome! Thanks!

1 Like

I just tried your code and I get an error:

FileNotFoundError: [Errno 2] No such file or directory: 'base_model/ner/moves'

I saved the base model as ‘base_model’ and then run:

prodigy ner.batch-train <my_set> /path/to/base_model --output <my_new_model> --eval-split 0.5 --label LABEL1,LABEL2

I’m using prodigy 1.4.2

Hmm, that’s strange – I couldn’t reproduce the error. Could you post the full error including the stack trace? And could you run python -m spacy info to check which version of spaCy you’re using?

spaCy version 2.0.11

Traceback (most recent call last):
File “/Users/KB/.pyenv/versions/3.5.2/lib/python3.5/runpy.py”, line 184, in _run_module_as_main
main”, mod_spec)
File “/Users/KB/.pyenv/versions/3.5.2/lib/python3.5/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/prodigy/main.py”, line 259, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 167, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/prodigy/recipes/ner.py”, line 377, in batch_train
nlp = spacy.load(input_model)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/init.py”, line 15, in load
return util.load_model(name, **overrides)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/util.py”, line 116, in load_model
return load_model_from_path(Path(name), **overrides)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/util.py”, line 156, in load_model_from_path
return nlp.from_disk(model_path)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/language.py”, line 653, in from_disk
util.from_disk(path, deserializers, exclude)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/util.py”, line 511, in from_disk
reader(path / key)
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/language.py”, line 649, in
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
File “nn_parser.pyx”, line 901, in spacy.syntax.nn_parser.Parser.from_disk
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/util.py”, line 511, in from_disk
reader(path / key)
File “nn_parser.pyx”, line 898, in spacy.syntax.nn_parser.Parser.from_disk.lambda11
File “/Users/KB/.virtualenvs/foobar/lib/python3.5/site-packages/spacy/util.py”, line 464, in read_json
with location.open(‘r’, encoding=‘utf8’) as f:
File “/Users/KB/.pyenv/versions/3.5.2/lib/python3.5/pathlib.py”, line 1151, in open
opener=self._opener)
File “/Users/KB/.pyenv/versions/3.5.2/lib/python3.5/pathlib.py”, line 1005, in _opener
return self._accessor.open(self, flags, mode)
File “/Users/KB/.pyenv/versions/3.5.2/lib/python3.5/pathlib.py”, line 371, in wrapped
return strfunc(str(pathobj), *args)
FileNotFoundError: [Errno 2] No such file or directory: ‘base-model/ner/cfg’

Thanks! This is strange. Could you check the meta.json in base-model and see if it has "ner" in the "pipeline"? And if not, remove the "ner" entry from the list and try again?

“ner” was in the pipeline, but once I removed it, it worked just fine! Thanks!

Glad it worked! I think I might have also found the solution to the previous error (in a different thread where a user came across the same problem): deleting the model directory and saving it again fixed it!