PhraseMatcher Only takes words less than 10 length

abhinandansrivastava · December 4, 2018, 9:30am

when I am doing using PhraseMatcher

matcher.add('ORG', None, *[nlp(text) for text in Organisation])

if United States of America comes its throwing error, I think it takes words with length less than 10.

Spacy==2.0.18

abhinandansrivastava · December 4, 2018, 9:54am

ValueError: [T002] Pattern length (10) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.

ines · December 5, 2018, 1:36am

ValueError: [T002] Pattern length (10) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.

Yes, that's correct, spaCy's current PhraseMatcher implementation has this limit. In the upcoming version v2.1.0, the matcher engine has been rewritten and phrase patterns won't be limited to 10 tokens anymore.

In the meantime, you can always use the regular Matcher and create token-based patterns instead:

matcher = Matcher(nlp.vocab)
docs = nlp.pipe(Organisation)
# case-insensitive patterns
patterns = [{'lower': token.lower_} for token in doc]
# case-sensitive patterns
patterns = [{'orth': token.text} for token in doc]

ines · December 5, 2018, 12:46pm

A post was split to a new topic: Converting data to Prodigy’s format

Topic		Replies	Views
Pettern length (12) >= phrase_matcher.max_length usage , ner , solved	8	1455	November 24, 2019
Fuzzy (partial) matching with PhraseMatcher (NER task) usage , spacy , solved , medical	10	10069	January 13, 2020
Can't get phrase matching to work spancat	3	295	June 27, 2023
Extended pattern performance question ner , spacy	6	776	August 12, 2019
patterns using regex or shape usage , spacy	13	3672	March 5, 2018

PhraseMatcher Only takes words less than 10 length

Related topics