Hi @ines/@honnibal,
I trained a PROPN
tag for my dataset because I would like the model to recognize certain network entities as proper nouns. Since I had to use my tokenization scheme, I copied the pos.teach
recipe from Prodigy over to my recipe and just edited it to use my custom tokenizer.
When I use pos.batch-train
to train the model, it is failing with an error in the first iteration of the run. I looked on the forums, but the only other issue with a similar error was some issue that was sorted out in Prodigy 1.5.1
.
[Abhishek:~] [NM-NLP] $ prodigy pos.batch-train net_pos_tags en_core_web_sm --output-model /tmp/models/net_labels_ner --eval-split 0.2
Loaded model en_core_web_sm
Using 20% of accept/reject examples (110) for evaluation
Using 100% of remaining examples (445) for training
Dropout: 0.2 Batch size: 4 Iterations: 10
BEFORE 0.169
Correct 13
Incorrect 64
Unknown 1298
# LOSS RIGHT WRONG ACCURACY
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 253, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/prodigy/recipes/pos.py", line 236, in batch_train
drop=dropout)
File "cython_src/prodigy/models/pos.pyx", line 90, in prodigy.models.pos.Tagger.batch_train
File "cython_src/prodigy/models/pos.pyx", line 136, in prodigy.models.pos.Tagger.update
File "cython_src/prodigy/models/pos.pyx", line 156, in prodigy.models.pos.Tagger.inc_gradient
File "cython_src/prodigy/models/pos.pyx", line 164, in prodigy.models.pos.Tagger._multilabel_log_loss
IndexError: index -3 is out of bounds for axis 0 with size 2
Section of code that I changed in the recipe:
BEFORE:
if tag_map is not None:
tag_map = get_tag_map(tag_map)
model = Tagger(spacy.load(spacy_model), label=label, tag_map=tag_map)
AFTER:
if tag_map is not None:
tag_map = get_tag_map(tag_map)
nlp = spacy.load(spacy_model)
nlp.tokenizer = custom_tokenizer(nlp)
model = Tagger(nlp, label=label, tag_map=tag_map)
Thanks in advance for your help.