Prodigy ner-teach: ValueError: Invalid pattern

Hi Matthew, Innes,

First thank you very much for all the work done with Spacy and Prodigy, Spacy its a great framework and Prodigy a very useful tool!

I have a problem training new entity types with ner.teach in Spanish.

I have created a patterns file following the structure you have recommended here everything works perfect if the file does not have spanish characters like ´ñ´or ¨tildes¨ however when I try to train with patterns in spanish I get this error:

File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/prodigy/main.py", line 380, in
controller = recipe(args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 212, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/prodigy/recipes/ner.py", line 143, in teach
matcher = PatternMatcher(model.nlp).from_disk(patterns)
File "cython_src/prodigy/models/matcher.pyx", line 209, in prodigy.models.matcher.PatternMatcher.from_disk
File "cython_src/prodigy/models/matcher.pyx", line 136, in prodigy.models.matcher.PatternMatcher.add_patterns
File "cython_src/prodigy/models/matcher.pyx", line 60, in prodigy.models.matcher.create_matchers
File "cython_src/prodigy/models/matcher.pyx", line 29, in prodigy.models.matcher.parse_patterns

ValueError: Invalid pattern:

Can you please help me on this topic?

Thanks in advance

Hi and thanks!

Special unicode patterns shouldn't matter, since they're really common :thinking: When you see the "Invalid pattern" error, it's typically followed by whatever string turned out to be invalid (typically invalid JSON). In the example you posted, there's kinda nothing, which leads me to think that there's probably a stray newline \n or a leading/trailing space somewhere in your JSON file and that's what it currently complains about. Can you double-check and see if you can find anything?

Btw, here's the function Prodigy uses internally to quickly "validate" the patterns. If that returns False, the above error is raised. So you could also load your patterns file and then call that function on all lines in the data and see which one it fails on.

def is_valid_pattern(entry):
    return (
        "pattern" in entry
        and "label" in entry
        and (
            isinstance(entry["pattern"], list)
            or isinstance(entry["pattern"], str)
            and isinstance(entry["label"], str)
        )
    )

Mea culpa :cry: I used the key patterns instead of pattern. Thanks for your prompt response!

Yay, glad it's working now! (And I can totally relate – I constantly do all the "hard parts" right and what it comes down to is like... a comma :sweat_smile:)