REGEX operator in the patterns file


(Mario Measic) #1

Hi there! First of all, thank you for the support!

In order to use the new REGEX operator in the patterns file, I would like to provide a pattern in the patterns.jsonl file.

So, let’s say I have a lot of examples where I expect a token or a sequence of tokens to be labelled with a specific label, but after a specific token (that specific token actually designates where the bank transaction occurred).

Therefore, a pattern is a simple one, using positive look behind and captures everything after.
{“label”: “MERCHANT”,“pattern”: [{“REGEX”: “(?<=IL\s).*”}]}

P.S: I have added an escaping backslash because of JSON decoder

However, after I run the ner.match recipe, every token is labelled as a MERCHANT with the pattern ID being 0 (the one I have provided).

What am I doing wrong?

(Ines Montani) #2

Sorry if this was confusing – I assume you’re referring to the REGEX attribute proposal in this GitHub thread? This thread is still only the spec and proposal, i.e. the planned implementation. The changes will hopefully ship with spaCy v2.1.0 (since some of the changes to the Matcher internals are not fully backwards compatible). But they’re not yet available in the stable release and not implemented in the current nightly build.

(Mario Measic) #3

Thanks! I implemented the custom recipe and adjusted it to receive the various regular expressions in order to speed-up the gathering of annotations.

Have a nice day!