PATTERN with LEMMA for Spanish model drops an error

sorriluis · May 1, 2021, 10:36am

Hi,

Trying to launch prodigy, for text in Spanish:

prodigy ner.manual MAStipif2 blank:es ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

When I try to add this pattern {"label":"DURACION","pattern":[{"POS":"NUM"},{"LEMMA":"mes"}]}

I get this error message

"The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA"

I guess that this is due to the fact that Spanish model don't have the tagger component. Is there any other way to overcome this?

Thanks

SofieVL · May 1, 2021, 12:29pm

Hi!

If you're running this command:

prodigy ner.manual MAStipif2 blank:es ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

then the blank:es part refers to a "blank" Spanish model that doesn't have any pretrained components. It would only have a tokenizer. Instead, what you can do is use one of the pretrained Spanish models:

prodigy ner.manual MAStipif2 es_core_news_lg ./tipif.json --label FIJO,PLAN,PLAZO,DESCUENTO,PRECIO,PROCESO,TERMINAL --patterns tipi_pattern.jsonl

To do so, make sure you have es_core_news_lg installed in your environment. If not, you can download it first like so:

spacy download es_core_news_lg

The new Spanish v3 models indeed don't have a tagger, but instead they have a morphologizer component which sets the POS attribute you need for your custom pattern.

Hope that works for you - let us know if it doesn't!

sorriluis · May 1, 2021, 3:45pm

I made it work with model es_core_news_sm, many thanks for the reply (more coming on a weekend)

With es_core_news_sm model I can check in a Jupyter Notebook what are the LEMMAS, POS

import spacy
!python3 -m spacy download es_core_news_sm
import es_core_news_sm
nlp = es_core_news_sm.load()
doc = nlp("12 MESES DE CDP. un mes. UN MES. CUATRO DÍAS cuatro días dos meses")

for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_digit)

When I try the same with the larger model, es_core_news_lg I cannot make it work. No matter what method to download it I try, I cannot make it work:
import es_core_news_lg
nlp = spacy.load('es_core_news_lg ')

SofieVL · May 1, 2021, 11:31pm

If you're running this in a Jupyter notebook, have you tried restarting the kernel after downloading es_core_news_lg? Or better yet - ensure the environment is properly set up, with the model installed, before you start the notebook. We've seen some trouble in the past with notebooks & virtual environments, but this is unfortunately not something we can control...

sorriluis · May 2, 2021, 12:34pm

Thanks. Got versions messed up from my side.
Final configuration that has worked:
spacy==2.3.0
prodigy==1.10.8
es-core-news-lg==2.3.1

In the installation about Prodigy:
https://prodi.gy/docs/install
, it says spaCy v2.2

But es-core-news-lg demands 2.3.0

On a side not, when a printed

print(spacy.version)

in Jupyter I was having version 2.3.5

I overlooked the versions and got this mismatch

Topic		Replies	Views
Prodigy Lemma support in Dutch - NER patterns usage , ner , spacy	1	1070	August 3, 2020
Problem creating a new language to serve as a base model for further improvement in Prodigy spacy , pos	3	644	August 17, 2020
Prodigy ner-teach: ValueError: Invalid pattern ner , solved	3	476	September 6, 2019
lemmas in the annotation workflow	2	277	April 7, 2023
No tagger in pre-trained models? coref	1	204	March 26, 2024

PATTERN with LEMMA for Spanish model drops an error

Related topics