ValueError: Can't read file: ing_patterns2021-10-11.jsonl\cfg

Since Prodigy exports the patterns files, I am assuming it is a fair question to ask here.

I asked the same question on the Spacy github support page after I received the same error message for another patterns file. The two people who responded were a bit mystified. It is not a big deal to re-export the files via Prodigy when I receive this message for the smaller patterns files, which resolved the problem the first time.
In this case, there are about 6,000 ingredients in ing_patterns.jsonl.

Can you please explain what this error message means? And how I might fix it? It keeps popping up - though not consistently - for reasons unknown. It surprises me, but I found no reference to a resolution anywhere.

I tried playing with the file names, and the file locations, but it doesn't seem to make a difference. I don't know why the system expects to find a config file. I have a config and base_config file in the main folder, set to defaults. And there is no apparent, direct relationship between those files and these files. Or none that I could find.

Here is all the Spacy-related code from that page:

nlp = spacy.load("en_core_web_lg")

@Language.component("set_custom_boundaries")
def set_custom_boundaries(doc):
    for token in doc[:-1]:
        if token.text == '\n':
            doc[token.i + 1].is_sent_start = True
    return doc

nlp.add_pipe("set_custom_boundaries", before="parser")

rulerIngs = nlp.add_pipe("entity_ruler", name="rulerIngs", before="ner")
rulerUms = nlp.add_pipe("entity_ruler", name="rulerUms", before="ner")
rulerAmts = nlp.add_pipe("entity_ruler", name="rulerAmt", before="ner")
rulerMods = nlp.add_pipe("entity_ruler", name="rulerMods", before="ner")

rulerIngs.from_disk("patterns/ing_patterns2021-10-11.jsonl")
rulerUms.from_disk("patterns/ums_patterns2021-10-11.jsonl")
rulerAmts.from_disk("patterns/amts_patterns2021-10-16.jsonl")
rulerMods.from_disk("patterns/mods_patterns2021-10-11.jsonl")

Here is the entire error message:


(venv) C:\Users\rober\food_ner>python test.py
Traceback (most recent call last):
  File "C:\Users\rober\food_ner\test.py", line 45, in <module>
    rulerIngs.from_disk("ing_patterns2021-10-11.jsonl")
  File "C:\Users\rober\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\pipeline\entityruler.py", line 429, in from_disk
    from_disk(path, deserializers_cfg, {})
  File "C:\Users\rober\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py", line 1225, in from_disk
    reader(path / key)
  File "C:\Users\rober\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\pipeline\entityruler.py", line 428, in <lambda>
    deserializers_cfg = {"cfg": lambda p: cfg.update(srsly.read_json(p))}
  File "C:\Users\rober\AppData\Local\Programs\Python\Python39\lib\site-packages\srsly\_json_api.py", line 51, in read_json
    file_path = force_path(path)
  File "C:\Users\rober\AppData\Local\Programs\Python\Python39\lib\site-packages\srsly\util.py", line 24, in force_path
    raise ValueError(f"Can't read file: {location}")
ValueError: Can't read file: ing_patterns2021-10-11.jsonl\cfg

Thank you in advance for your feedback.

Robert

Here is also a link to the discussion on the Spacy support page, after I received the same message for a different patterns file.

https://github.com/explosion/spaCy/discussions/9485

Hi! This is definitely strange because the EntityRuler should support loading both JSONL files and a serialized directory in its from_disk method :thinking: Could you double-check that the file you're trying to load here definitely exists and is a valid file?

One possible explanation could be that the file doesn't actually exist, in which case, the loading would fall back to treating it as a directory, which then fails with that slightly confusing error message. See here for the relevant part in the implementation:

1 Like

I am embarrassed to admit it, but in this case, I had somehow deleted the entire folder of patterns files though I have no memory or reason to do so. I do have a neurological injury. I take medication that can affect my short-term memory...so that's my only excuse. But I have never before deleted a file or folder and forgot somehow. In the other case, I still don't know what happened. That time, exporting the file a second time solved the problem. Thanks you for the response.

Glad you found the problem and don't feel bad about the mistake – you'd be surprised by the small and seemingly silly mistakes even very experienced programmers make :sweat_smile: It comes up on our internal Slack all the time. This is actually kinda classic in programming: I usually do all the "hard stuff" correctly and then the problem is a small typo.

It's still good that this came up because we can definitely improve the error message in spaCy and tell the user that the path doesn't exist. This will make it easier to track down the problem. I've put this on our list of spaCy enhancements!

1 Like

Thank you. Yes! If the error message had said "can't find file," I would have looked in a better place for answers. As it stands, I kept wondering, why would a text-based file have a config file?

Thank again.

The reason this error message ended up being confusing is that there are basically two ways to load an EntityRuler from disk:

  1. From a JSONL file containing only the patterns.
  2. From an already serialized entity ruler, saved out with EntityRuler.to_disk, which includes the patterns and a cfg file with the entity ruler settings.

Since the method supports both, it currently checks whether the file is a loadable JSONL and if not, it tries to load it from a directory. But it currently doesn't raise a custom error message for the case where the input is supposed to be a JSONL file that doesn't exist.