Pattern File - Error when using

matt.whitby · March 6, 2022, 1:16pm

I generated a pattern file to try.

I'm running it as:

python -m prodigy ner.manual he_model blank:en nhle_1000000_101000.txt --label PERSON,ORG,PERIOD,RIVER,RELIGIOUSBUILDING,TITLE,CASTLE,COUNTRY --patterns castle_pattern.json

A sample of the pattern file is.

{"label": "CASTLE", "pattern": "Caerlaverock Castle"}
{"label": "CASTLE", "pattern": "Cardiff Castle"}
{"label": "CASTLE", "pattern": "Carnasserie Castle"}
{"label": "CASTLE", "pattern": "Cawdor Castle"}
{"label": "CASTLE", "pattern": "Chepstow Castle"}

But, I get the error: "The component 'matcher' does not have any patterns defined.
stream = (eg for _, eg in pattern_matcher(stream)) ''

matt.whitby · March 7, 2022, 10:46am

Any idea on this @ines ?

ines · March 7, 2022, 5:31pm

That's definitely strange! Are you sure the castle_pattern.json file you're providing on the CLI is the correct one with the patterns listed? Also, I don't think it should make a difference but you probably want to rename the file to .jsonl so the file extension matches the data format.

matt.whitby · March 7, 2022, 6:47pm

Yup, I had tried renaming it and no dice. It's definitely the right name.
It's in the same folder (i.e. the root of the project) of the text file (nhle_1000000_101000.txt)

Even if I change the pattern file to have a single line I still get the same error.

{"label": "CASTLE", "pattern": "Caerlaverock Castle"}

matt.whitby · March 8, 2022, 9:56am

Turn out it wanted a list for the text portion, even if there's only a single string.

{"label": "CASTLE", "pattern": [{"text": "Clifton Castle"}]}

`python -m prodigy ner.manual he_model mymodel\model-best nhle_1000000_101000.txt

--label PERSON,ORG,PERIOD,RIVER,RELIGIOUSBUILDING,TITLE,CASTLE,COUNTRY --patterns patternfile.json`

It does now load, but doesn't do anything. Am I misunderstanding, or should it not highlight phrases from the pattern list automatically?

ines · March 9, 2022, 11:38am

Ah, it looks like the problem is just that spaCy shows a warning (W036) if a matcher doesn't have any patterns defined – for example, in your case, you'd only have phrase patterns, not token-based patterns. This should only be a warning, though, and not an error so it's safe to ignore it.

The problem here is likely that your pattern describes one token with the text Clifton Castle, which is never going to be true because the string will be split into two tokens. So if you change the patterns to represent two tokens, it should work as expected:

{"label": "CASTLE", "pattern": [{"text": "Clifton"}, {"text": "Castle"}]}

Another advantage of token-based patterns is that you can match on other token attributes, e.g. the lower attribute for case-insensitive matches, or even other attributes like POS tags etc.

{"label": "CASTLE", "pattern": [{"lower": "clifton"}, {"lower": "castle"}]}

matt.whitby · March 9, 2022, 11:44am

Ah, okay - thank you. I did wonder but somewhere in the documentation there's an example where it isn't split so I figured it was okay.

M

Topic		Replies	Views
UserWarning: [W036] The component 'matcher' does not have any patterns defined. stream = (eg for _, eg in pattern_matcher(stream)) usage	4	1841	December 21, 2022
Error when running Prodigy 1.9.1 ner.manual with --patterns argument ner , done	2	579	December 21, 2019
ner.match error with exact string patterns enhancement , usage , ner , done	8	762	June 12, 2018
ner.match ner , spacy , solved	17	703	January 7, 2020
match pattern work in spacy but does not work in prodigy usage , ner , spacy	2	436	January 25, 2021

Pattern File - Error when using

Related topics