Running textcat.teach
with patterns
I am getting the same task presented multiple times (one time for every pattern match). Afaik textcat.teach
is labelling the document and not the tokens so is this a bug or?
This is currently an expected limitation, because the matter will just yield out every match and doesn't do any filtering or make assumptions about what the matches "mean". We do want to add an option to change the default for text classification, it just requires some rewrites of how the pattern matcher works internally.
In the meantime, see this thread for details and how to add your own filter function: