NER Tagging with patterns

Hi there!

I am wondering if i can combine the patterns file with ner.manual? So that i can refine the tags that are suggested by the model?

I was previously trying to use ner.teach with a pattern file to tag ‘duration’ entity. For instance:

-> has been experiencing for the past 2 days ( I want to be able to recognize past 2 days with my model, but as my patterns file only contain days, model was only able to pick up the word ‘days’, and it doesnt seem feasible to include every possible variations of the number of days in the patterns file).

Thanks, hopefully ive been clear in describing the issue. :slight_smile:

Hi! You can try using the other token attributes – that's what's so cool about the token-based patterns :slightly_smiling_face: For example, the like_num attribute will match all tokens whose value resemples a numer. That could be "2", but also "two" or "2.5". You can find more details on this in the spaCy docs.

So one pattern could look like this:

{"label": "DURATION", "pattern": [{"like_num": true}, {"lower": "days"}]}

Instead of "lower": "days", you could also try "lemma": "day" – this would match all tokens whose base form is "day", so both "day" and "days".

If you're using ner.teach with an existing model, keep in mind that it can be difficult for the model to learn new definitions of entities that "clash" with existing labels. For instance, the pre-trained English model might already classify some of the durations as DATE or numbers as CARDINAL. Trying to teach it a completely new definition can be tricky and would require a lot more data. So it might make sense for you to start off with a blank model instead.

There's no out-of-the-box way to do this in ner.manual – but you could build your own custom recipe like that. The only difficult part here is that you'll likely want all matches in the example, and you'll have to handle overlapping matches etc.