It actually took me a long time to work this out so: Question 1, can you point me to the shape documentation, as I want to investigate whether there are other things it could help me with.
Whilst shape works, previously I was trying to find decimals with regex which I cant get to work
Question 2 Is it actually possible to use regex to match patterns in prodigy and if so can you help me find decimals eg 34.56 with regex (shape works, but this will help me ‘see’ potential other solutions). My attempt was
Question 3
If regex does work to create patterns can you help me the correct syntax to match Number ranges eg:
numbers between 30.01 and 30.99 and 50.00 and 50.99. My attempt
Re Question 1
Prodigy Matcher uses spaCy Matcher under the hood so the spaCy docs on rule-based matching should be the place to look for information. SHAPE uses the shape attribute of tokens which is documented here (you can always print the orthographic shape of every token (.shape_) to see what the pattern you should use in the Matcher rules.
Re Question 2
The reason why you're first DECIMAL pattern doesn't work is that spaCy regex patterns are applied to a single token. Your first patterns defines a sequence of three tokens which does not appear in the input text because 34.56 is a single token - not a sequence of 3 tokens (if you use spaCy default tokenizer).
The second pattern is a better attempt but it contains token boundaries markers inside which also makes it impossible to match on the text. Here's the corrected version of the pattern that matches dd.dd kind of decimals:
Another approach could be leverage spaCy built-in pattern matching to find numbers with LIKE_NUMtoken attribute and then apply decimal detection regex only to these tokens.
Re Question 3
Your patterns look correct just made it a bit more precise because the first range starts with 01 and the second with 00.