patterns.jsonl format

Hi,

I manually created a patterns.jsonl file (by hand) following the instructions here.

This is one of the given examples:
[{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]

However, True will cause the json parser to throw the following error:
raise JSONDecodeError("Expecting value", s, err.value) from None

It should, in fact, be a lowercase true.

Ah, I first thought this was in the Prodigy documentation – but in spaCy’s docs, the patterns are written in Python, since this is how you’ll feed them in when using spaCy directly:

pattern = [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
matcher.add('TEST', None, pattern)

Since Prodigy lets you load in patterns in JSON format, this notation will have to be converted to the JavaScript notation as well. This usually means that all quotes should be double quotes, true and false are lowercase and None → null (although this usually shouldn’t come up much). Prodigy’s pattern format also uses lowercase keys for the token properties:

{"label": "TEST", "pattern": [{"lower": "hello"}, {"is_punct": true}, {"lower": "world"}]}

I can add a note about this in Prodigy’s pattern file documentation to make this less confusing – so thanks for bringing this up :+1:

Hi,

I am also trying to make manually some patterns for Prodigy like the following:

{“label”:“plant_capacity”,“pattern”:[{“lemma”:“plant”},{“lemma”:“process”},{“lemma”:“capacity”}]}
{“label”:“plant_capacity”,“pattern”:[{“lemma”:“plant”},{“lemma”:“capacity”}]}
{“label”:“plant_capacity”,“pattern”:[{“lemma”:“capacity”},{“is_ascii”: true, “op”: “*”},{“lower”:“mtpa”}]}
{“label”:“plant_capacity”,“pattern”:[{“lower”:“mtpa”}]}
{“label”:“plant_capacity”,“pattern”:[{“lower”:“mmscfd”}]}

I am receiving following error using textcat.teach after I export my files and manually made modifications using nodepad ++ (before manual modifications, the file was readable):

immagine

thank you in advance
kind regards,

claudio nespoli

What exactly did you change? The error indicates that somehow, the JSON in your file isn't readable, so maybe editing it in notepad++ corrupted the file.

In the example you copied, all quotation makes show up as “ instead of " – was it like that before, or did that only get changed by our forum? I just tested it with your examples and " for all quotation marks, and it worked fine for me.

And could you try copying each line of your file separately to this JSON validator and check if it complains?

thank you very much also for the speed in the answer,

the quotation type was ok, it was just changed during copy-paste in the forum

you are right, my file had over 70 lines and I made a mistake, there was one content not included in double quotes

the JSON validator was very useful

claudio

1 Like