Doubt about PatternMatcher

alvaro.marlo · July 14, 2021, 11:58am

Hello, I'm using in a custom recipe the function PatternMatcher.from_disk like this:

model = PatternMatcher(spacy.load(spacy_model), 
                                          combine_matches=True,
                                          all_examples=True).from_disk(patterns)

In my patterns file I have these two entries:

{"label": "GPE", "pattern": "Murcia"}
{"label": "GPE", "pattern": "Región de Murcia"}

And in my stream I have a text like this one:

{"text": "[...]En el caso de la retirada y destrucción de bovinos muertos en la explotación, el ámbito de aplicación lo constituyen las explotaciones ubicadas en el territorio de las comunidades autónomas de Andalucía, Aragón, Principado de Asturias, Illes Balears, Canarias, Cantabria, Castilla-La Mancha, Castilla y León, Cataluña, Extremadura, Galicia, La Rioja, Madrid, Región de Murcia, Foral de Navarra y Valenciana.[...]"}

The span recognized is "Región de Murcia", not "Murcia", that's because it is longer? If not, which is the criterion to select "Región de Murcia" and not "Murcia"?

SofieVL · July 14, 2021, 9:09pm

Your hunch is correct: when spans overlap, the longest span is preferred over shorter spans.

ines · July 15, 2021, 12:36am

If you want to implement your own custom logic for which spans to actually annotate, you could use spaCy's PhraseMatcher directly and match your patterns on each incoming text in the stream, which gives you all possible matches including overlaps. You can then have your own logic for which one to add to the "spans" if two matches overlap. Just make sure that your logic is consistent, because otherwise, you end up with inconsistent suggestions and are more likely to also get inconsistent data.

Topic		Replies	Views
Can't get phrase matching to work spancat	3	295	June 27, 2023
textcat.manual with --patterns argument enhancement , textcat	7	1100	September 25, 2019
Custom recipe for Annotating Overlapping Spans custom , front-end , best-practices	15	2501	September 6, 2020
(Re)using labels in patterns usage , spacy	1	315	July 21, 2021
Problem with new entity type and patterns usage , ner , solved	2	817	January 8, 2019

Doubt about PatternMatcher

Related topics