PATTERN with LEMMA for Spanish model drops an error


Trying to launch prodigy, for text in Spanish:

prodigy ner.manual MAStipif2 blank:es ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

When I try to add this pattern {"label":"DURACION","pattern":[{"POS":"NUM"},{"LEMMA":"mes"}]}

I get this error message

"The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA"

I guess that this is due to the fact that Spanish model don't have the tagger component. Is there any other way to overcome this?



If you're running this command:

prodigy ner.manual MAStipif2 blank:es ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

then the blank:es part refers to a "blank" Spanish model that doesn't have any pretrained components. It would only have a tokenizer. Instead, what you can do is use one of the pretrained Spanish models:

prodigy ner.manual MAStipif2 es_core_news_lg ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

To do so, make sure you have es_core_news_lg installed in your environment. If not, you can download it first like so:

spacy download es_core_news_lg

The new Spanish v3 models indeed don't have a tagger, but instead they have a morphologizer component which sets the POS attribute you need for your custom pattern.

Hope that works for you - let us know if it doesn't!

I made it work with model es_core_news_sm, many thanks for the reply (more coming on a weekend)

With es_core_news_sm model I can check in a Jupyter Notebook what are the LEMMAS, POS

import spacy
!python3 -m spacy download es_core_news_sm
import es_core_news_sm
nlp = es_core_news_sm.load()
doc = nlp("12 MESES DE CDP. un mes. UN MES. CUATRO DÍAS cuatro días dos meses")

for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_digit)

When I try the same with the larger model, es_core_news_lg I cannot make it work. No matter what method to download it I try, I cannot make it work:
import es_core_news_lg
nlp = spacy.load('es_core_news_lg ')

If you're running this in a Jupyter notebook, have you tried restarting the kernel after downloading es_core_news_lg? Or better yet - ensure the environment is properly set up, with the model installed, before you start the notebook. We've seen some trouble in the past with notebooks & virtual environments, but this is unfortunately not something we can control...

Thanks. Got versions messed up from my side.
Final configuration that has worked:

In the installation about Prodigy:
, it says spaCy v2.2

But es-core-news-lg demands 2.3.0

On a side not, when a printed


in Jupyter I was having version 2.3.5

I overlooked the versions and got this mismatch