Pattern File - Error when using

I generated a pattern file to try.

I'm running it as:

python -m prodigy ner.manual he_model blank:en nhle_1000000_101000.txt --label PERSON,ORG,PERIOD,RIVER,RELIGIOUSBUILDING,TITLE,CASTLE,COUNTRY --patterns castle_pattern.json

A sample of the pattern file is.

{"label": "CASTLE", "pattern": "Caerlaverock Castle"}
{"label": "CASTLE", "pattern": "Cardiff Castle"}
{"label": "CASTLE", "pattern": "Carnasserie Castle"}
{"label": "CASTLE", "pattern": "Cawdor Castle"}
{"label": "CASTLE", "pattern": "Chepstow Castle"}

But, I get the error: "The component 'matcher' does not have any patterns defined.
stream = (eg for _, eg in pattern_matcher(stream)) ''

Any idea on this @ines ?

That's definitely strange! Are you sure the castle_pattern.json file you're providing on the CLI is the correct one with the patterns listed? Also, I don't think it should make a difference but you probably want to rename the file to .jsonl so the file extension matches the data format.

Yup, I had tried renaming it and no dice. It's definitely the right name.
It's in the same folder (i.e. the root of the project) of the text file (nhle_1000000_101000.txt)

Even if I change the pattern file to have a single line I still get the same error.

{"label": "CASTLE", "pattern": "Caerlaverock Castle"}

Turn out it wanted a list for the text portion, even if there's only a single string.

{"label": "CASTLE", "pattern": [{"text": "Clifton Castle"}]}

`python -m prodigy ner.manual he_model mymodel\model-best nhle_1000000_101000.txt


It does now load, but doesn't do anything. Am I misunderstanding, or should it not highlight phrases from the pattern list automatically?

Ah, it looks like the problem is just that spaCy shows a warning (W036) if a matcher doesn't have any patterns defined – for example, in your case, you'd only have phrase patterns, not token-based patterns. This should only be a warning, though, and not an error so it's safe to ignore it.

The problem here is likely that your pattern describes one token with the text Clifton Castle, which is never going to be true because the string will be split into two tokens. So if you change the patterns to represent two tokens, it should work as expected:

{"label": "CASTLE", "pattern": [{"text": "Clifton"}, {"text": "Castle"}]}

Another advantage of token-based patterns is that you can match on other token attributes, e.g. the lower attribute for case-insensitive matches, or even other attributes like POS tags etc.

{"label": "CASTLE", "pattern": [{"lower": "clifton"}, {"lower": "castle"}]}

Ah, okay - thank you. I did wonder but somewhere in the documentation there's an example where it isn't split so I figured it was okay.