textcat.teach with pattern match failed with trained model

curious · July 30, 2020, 3:18am

I'm using Prodigy 1.10.2. If I use a trained model, I got the following error -

Traceback (most recent call last):
File "/home/ec2-user/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ec2-user/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/prodigy/main.py", line 60, in
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 318, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src/prodigy/core.pyx", line 138, in prodigy.core.Controller.init
File "cython_src/prodigy/components/feeds.pyx", line 56, in prodigy.components.feeds.SharedFeed.init
File "cython_src/prodigy/components/feeds.pyx", line 155, in prodigy.components.feeds.SharedFeed.validate_stream
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/toolz/itertoolz.py", line 376, in first
return next(iter(seq))
File "cython_src/prodigy/components/sorters.pyx", line 98, in iter
File "cython_src/prodigy/util.pyx", line 449, in predict
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/toolz/itertoolz.py", line 242, in interleave
yield next(itr)
File "cython_src/prodigy/models/matcher.pyx", line 167, in call
File "matcher.pyx", line 234, in spacy.matcher.matcher.Matcher.call
ValueError: [E155] The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA. Try using nlp() instead of nlp.make_doc() or list(nlp.pipe()) instead of list(nlp.tokenizer.pipe()).

The command I used to teach is -
PRODIGY_PORT=8000 prodigy textcat.teach temp_db5 /tmp/my_model/ ~/user/class/chat_sentence_plus.jsonl --label my_class -pt ~/fiona/class/class_pattern.jsonl

If I replace my trained model with en_core_web_lg, I wont' get the error.

This is command I used to train my model -
prodigy train textcat temp_db5 blank:en -o /tmp/my_model/

adriane · July 30, 2020, 7:33am

If your patterns include POS/TAG/LEMMA you need a tagger component in your model. When you train your textcat model with the base model blank:en, there's no tagger, the only component is the new textcat component.

Instead of blank:en, specify a base model that contains a tagger like en_core_web_lg.

prodigy train textcat temp_db5 en_core_web_lg -o /tmp/my_model/

curious · August 3, 2020, 1:04am

Thank you. Yes your solution worked. I don't get the error any more.

However as I remember we were encouraged to train a new model from a blank model instead of the pre-trained model. Will I get a worse model if I start the training with en_core_web_lg? Thanks.

Topic		Replies	Views
textcat.teach with custom model from spaCy textcat , spacy , solved	3	472	May 21, 2020
Basic question about Prodigy annotations and model training. usage , ner	12	753	January 18, 2019
Error loading spacy POS TAG model for pos.teach usage , spacy	3	479	November 26, 2019
Unable to train textcat model using en_core_web_md as a base model textcat	11	1691	May 2, 2023
Training a grammar tool usage , textcat	24	5587	February 26, 2018

textcat.teach with pattern match failed with trained model

Related topics