Adding labels in ner.batch-train

I’m trying to create a new model with ner.manual and then train it further with ner.teach.
I was able to annotate my new labels, for which I used the following command:
prodigy ner.manual new_set en_core_web_sm train.jsonl --label labels.txt

Now I want to improve that dataset with other data by using ner.teach. How to do this?
I tried to create a new model out of the dataset with to use in ner.teach:
prodigy ner.batch-train new_set en_core_web_sm --output /tmp/model --eval-split 0.5 --label labels.txt

However, this resulted in the following error:

Traceback (most recent call last):
File “/usr/lib/python3.5/runpy.py”, line 184, in _run_module_as_main
main”, mod_spec)
File “/usr/lib/python3.5/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.5/dist-packages/prodigy/main.py”, line 253, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 150, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/usr/local/lib/python3.5/dist-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/usr/local/lib/python3.5/dist-packages/plac_core.py”, line 207, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File “/usr/local/lib/python3.5/dist-packages/prodigy/recipes/ner.py”, line 400, in batch_train
drop=dropout, beam_width=beam_width)
File “cython_src/prodigy/models/ner.pyx”, line 309, in prodigy.models.ner.EntityRecognizer.batch_train
File “cython_src/prodigy/models/ner.pyx”, line 370, in prodigy.models.ner.EntityRecognizer._update
File “cython_src/prodigy/models/ner.pyx”, line 364, in prodigy.models.ner.EntityRecognizer._update
File “cython_src/prodigy/models/ner.pyx”, line 365, in prodigy.models.ner.EntityRecognizer._update
File “/usr/local/lib/python3.5/dist-packages/spacy/language.py”, line 415, in update
proc.update(docs, golds, drop=drop, sgd=get_grads, losses=losses)
File “nn_parser.pyx”, line 558, in spacy.syntax.nn_parser.Parser.update
File “nn_parser.pyx”, line 676, in spacy.syntax.nn_parser.Parser._init_gold_batch
File “ner.pyx”, line 119, in spacy.syntax.ner.BiluoPushDown.preprocess_gold
File “ner.pyx”, line 178, in spacy.syntax.ner.BiluoPushDown.lookup_transition
KeyError: ‘B-IDENTIFIER’

Nevermind. I found the ner.gold-to-spacy recipe :slight_smile:

Thanks for updating – and yes, this works as well! Your workflow definitely makes sense and after pre-training the model, you can simply load it into ner.teach using the path to the data directory:

prodigy ner.teach your_dataset /path/to/pretrained-model your_data.jsonl --label SOME_LABEL

About your initial report: I think the problem here is that the input format of the --label argument currently isn’t 100% consistent. For ner.manual, we introduced the option to load labels from a file (since you often want to load in a larger label set) – but all other recipes currently expect the labels to be a string. There’s also a slight inconsistency around adding unknown labels to the model, which we’ve already fixed for the upcoming release.

In the meantime, adding the following to ner.batch-train should work:

ner = nlp.get_pipe('ner')   # get the model's entity recognizer 
labels = get_labels(label)  # this helper function supports loading from a file
for l in labels:
    ner.add_label(l)        # add label to the model

Alternatively, you could also iterate over the examples and their spans, and add each span['label'] to the model (add_label will ignore labels that are already present in the model, so you don’t have to worry about filtering out the new ones).

I’ll also experiment with better ways of handling the --label argument. Plac (which Prodigy uses for the recipes CLI) supports converter functions – so we could handle all loading in a function that checks whether the value is a path or a string of comma-separated labels, and returns them as a list (similar to util.get_labels).

Ah that makes sense. For now the ner.gold-to-spacy worked fine.
It needed some tweaking because you can’t directly load the jsonl into the example for creating a new model but it worked out and I’m able to use ner.teach on it now.

1 Like