Textcat highlights pattern


I am currently training a classifier via textcat.teach. I have imported a pattern file with a few patterns that might correspond to the label that I want.

Since I have a few patterns, these patterns may occur simultaneously in one text-file. However, my intention is that I would simply accept the classification as long as it fulfills one guideline (regardless if other patterns gives negative indication). In this aspect, I refer to @ines previous reply on another thread:

My concern lies in whether the textcat.teach recipe considers the highlighted tokens/spans in making a determining a likelihood of whether the textfile gets classified to the label. For eg. does the recipe consider the neighbouring words of the highlighted tokens/spans? Or other related issues.

Or I could simply just accept/reject the label, without concerning the highlighted text?


The recipe maintains two sets of weights: one for the statistical model, and then a simple likelihood parameter associated with each numeric pattern. So if some pattern has more rejects than accepts, the recipe will learn to stop asking about it.

When you actually train the text classifier on the annotations, the highlighted spans won’t influence the model. The model will be learning a representation informed by all words in the text — the highlighted span doesn’t receive any special treatment.