PATTERN with LEMMA for Spanish model drops an error

Hi,

Trying to launch prodigy, for text in Spanish:

prodigy ner.manual MAStipif2 blank:es ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

When I try to add this pattern {"label":"DURACION","pattern":[{"POS":"NUM"},{"LEMMA":"mes"}]}

I get this error message

"The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA"

I guess that this is due to the fact that Spanish model don't have the tagger component. Is there any other way to overcome this?

Thanks

Hi!

If you're running this command:

prodigy ner.manual MAStipif2 blank:es ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

then the blank:es part refers to a "blank" Spanish model that doesn't have any pretrained components. It would only have a tokenizer. Instead, what you can do is use one of the pretrained Spanish models:

prodigy ner.manual MAStipif2 es_core_news_lg ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

To do so, make sure you have es_core_news_lg installed in your environment. If not, you can download it first like so:

spacy download es_core_news_lg

The new Spanish v3 models indeed don't have a tagger, but instead they have a morphologizer component which sets the POS attribute you need for your custom pattern.

Hope that works for you - let us know if it doesn't!

1 Like

I made it work with model es_core_news_sm, many thanks for the reply (more coming on a weekend)

With es_core_news_sm model I can check in a Jupyter Notebook what are the LEMMAS, POS

import spacy
!python3 -m spacy download es_core_news_sm
import es_core_news_sm
nlp = es_core_news_sm.load()
doc = nlp("12 MESES DE CDP. un mes. UN MES. CUATRO DÍAS cuatro días dos meses")

for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_digit)

When I try the same with the larger model, es_core_news_lg I cannot make it work. No matter what method to download it I try, I cannot make it work:
import es_core_news_lg
nlp = spacy.load('es_core_news_lg ')

If you're running this in a Jupyter notebook, have you tried restarting the kernel after downloading es_core_news_lg? Or better yet - ensure the environment is properly set up, with the model installed, before you start the notebook. We've seen some trouble in the past with notebooks & virtual environments, but this is unfortunately not something we can control...

Thanks. Got versions messed up from my side.
Final configuration that has worked:
spacy==2.3.0
prodigy==1.10.8
es-core-news-lg==2.3.1

In the installation about Prodigy:
https://prodi.gy/docs/install
, it says spaCy v2.2

But es-core-news-lg demands 2.3.0

On a side not, when a printed

print(spacy.version)

in Jupyter I was having version 2.3.5

I overlooked the versions and got this mismatch