Pattern doesn't work in Prodigy but does work in spacy matcher

I am training a NER model using Prodigy and have run into an situation where one of the patterns I built and tested using spacy's matcher class doesn't work as an input in a patterns file. Is this a bug?

prog_patterns.jsonl (618 Bytes)


import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")

matcher = Matcher(nlp.vocab, validate=True)

pattern = [{"ORTH": {"IN": ['senior', 'Senior', 'SENIOR', 'sr', 'Sr', 'SR', 'sr.', 'Sr.', 'SR.', 'junior', 'Junior', 'JUNIOR', 
                            'jr', 'Jr', 'JR', 'jr.', 'Jr.', 'JR.', 'c', 'C', 'C', 'c++', 'C++', 'C++', 'c#', 'C#', 'C#', 'csharp', 
                            'Csharp', 'CSHARP', 'java', 'Java', 'JAVA', 'javascript', 'Javascript', 'JAVASCRIPT', 'julia', 'Julia', 
                            'JULIA', 'r', 'R', 'python', 'Python', 'PYTHON', 'php', 'Php', 'PHP', 'ruby', 'Ruby', 'RUBY', 'sql', 
                             'Sql', 'SQL', 'nosql', 'Nosql', 'NOSQL', 'hql', 'Hql', 'HQL', 'fortran', 'Fortran', 'FORTRAN', 'cobalt', 'Cobalt', 'COBALT']}, "OP": "+"}, {"LOWER": "programmer"}]

matcher.add("programming", None, pattern)

matcher(nlp("Senior Java Programmer"))`


python -m prodigy ner.teach example_db en_core_web_sm job_titles_005.txt --loader txt --label example --patterns prog_patterns.jsonl
Using 1 labels: example
Traceback (most recent call last):
  File "C:\Anaconda\envs\ex\lib\site-packages\srsly\", line 131, in _yield_json_lines
    yield ujson.loads(line)
ValueError: Expected object or value

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Anaconda\envs\ex\lib\", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Anaconda\envs\ex\lib\", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Anaconda\envs\ex\lib\site-packages\prodigy\", line 380, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src\prodigy\core.pyx", line 212, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "C:\Anaconda\envs\ex\lib\site-packages\", line 328, in call
    cmd, result = parser.consume(arglist)
  File "C:\Anaconda\envs\ex\lib\site-packages\", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "C:\Anaconda\envs\ex\lib\site-packages\prodigy\recipes\", line 143, in teach
    matcher = PatternMatcher(model.nlp).from_disk(patterns)
  File "cython_src\prodigy\models\matcher.pyx", line 208, in prodigy.models.matcher.PatternMatcher.from_disk
  File "C:\Anaconda\envs\ex\lib\site-packages\srsly\", line 85, in read_jsonl
    for line in _yield_json_lines(f, skip=skip):
  File "C:\Anaconda\envs\ex\lib\site-packages\srsly\", line 135, in _yield_json_lines
    raise ValueError("Invalid JSON on line {}: {}".format(line_no, line))
ValueError: Invalid JSON on line 1: {"label":"example","pattern":[{"TEXT": {"IN": ['senior', 'Senior', 'SENIOR', 'sr', 'Sr', 'SR', 'sr.', 'Sr.', 'SR.', 'junior', 'Junior', 'JUNIOR', 'jr', 'Jr', 'JR', 'jr.', 'Jr.', 'JR.', 'c', 'C', 'C', 'c++', 'C++', 'C++', 'c#', 'C#', 'C#', 'csharp', 'Csharp', 'CSHARP', 'java', 'Java', 'JAVA', 'javascript', 'Javascript', 'JAVASCRIPT', 'julia', 'Julia', 'JULIA', 'r', 'R', 'python', 'Python', 'PYTHON', 'php', 'Php', 'PHP', 'ruby', 'Ruby', 'RUBY', 'sql', 'Sql', 'SQL', 'nosql', 'Nosql', 'NOSQL', 'hql', 'Hql', 'HQL', 'fortran', 'Fortran', 'FORTRAN', 'cobalt', 'Cobalt', 'COBALT']}, "OP": "+"}, {"LOWER": "programmer"}]}

Hi! If you see an "Invalid JSON" error, it really typically means just that: something in your JSON is invalid and json.loads fails on that line (that's what Prodigy is catching here). In the example you shared that's included in the traceback, I think it might be the single quotes in the list of terms, e.g. 'senior', 'Senior', 'SENIOR' and so on. If you replace those with double quotes, it'll be valid JSON and should be read in correctly.

When porting patterns between Python and JSON, I'd recommend using Python to write out the JSON so those small details are taken care of automatically. You can also use our srsly.write_jsonl helper to make this easier.


Well now I feel sheepish. Too much time in pure python. Thanks for responding!