when I am doing using PhraseMatcher
matcher.add('ORG', None, *[nlp(text) for text in Organisation])
if United States of America comes its throwing error, I think it takes words with length less than 10.
Spacy==2.0.18
when I am doing using PhraseMatcher
matcher.add('ORG', None, *[nlp(text) for text in Organisation])
if United States of America comes its throwing error, I think it takes words with length less than 10.
Spacy==2.0.18
ValueError: [T002] Pattern length (10) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.
ValueError: [T002] Pattern length (10) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.
Yes, that's correct, spaCy's current PhraseMatcher
implementation has this limit. In the upcoming version v2.1.0
, the matcher engine has been rewritten and phrase patterns won't be limited to 10 tokens anymore.
In the meantime, you can always use the regular Matcher
and create token-based patterns instead:
matcher = Matcher(nlp.vocab)
docs = nlp.pipe(Organisation)
# case-insensitive patterns
patterns = [{'lower': token.lower_} for token in doc]
# case-sensitive patterns
patterns = [{'orth': token.text} for token in doc]
A post was split to a new topic: Converting data to Prodigy’s format