PhraseMatcher Only takes words less than 10 length

when I am doing using PhraseMatcher

matcher.add('ORG', None, *[nlp(text) for text in Organisation])

if United States of America comes its throwing error, I think it takes words with length less than 10.

Spacy==2.0.18

ValueError: [T002] Pattern length (10) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.

ValueError: [T002] Pattern length (10) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.

Yes, that’s correct, spaCy’s current PhraseMatcher implementation has this limit. In the upcoming version v2.1.0, the matcher engine has been rewritten and phrase patterns won’t be limited to 10 tokens anymore.

In the meantime, you can always use the regular Matcher and create token-based patterns instead:

matcher = Matcher(nlp.vocab)
docs = nlp.pipe(Organisation)
# case-insensitive patterns
patterns = [{'lower': token.lower_} for token in doc]
# case-sensitive patterns
patterns = [{'orth': token.text} for token in doc]

A post was split to a new topic: Converting data to Prodigy’s format