Hi,
I am trying to use the patterns
argument in ner.manual
. I want to match any number that occurs after a a known term. I'm aware that spaCy's matcher applies on a per token basis. This Stack Overflow is an example of that. I was trying with a pattern like this but that doesn't work because it can't match across tokens:
[ {'LOWER': {'REGEX' : 'KNOWN_TERM (\d?.\d+)'}}]
For the moment I've settled on just matching the term and the number and will try to fix in a post processing step:
[{'LOWER': "KNOWN_TERM"}, {'LOWER': {'REGEX' : '(\d?.\d+)'}}]
I was looking at the operators but I don't think they will help me. Is there some sort of "BEFORE"/"AFTER" operators that could be used to solve my problem e.g.
[{'LOWER': "KNOWN_TERM", "OP": "BEFORE"}, {'LOWER': {'REGEX' : '(\d?.\d+)'}}]
Thanks for your help,
Dan