Textcat highlights pattern

jsnleong · July 18, 2019, 9:58am

Hi,

I am currently training a classifier via textcat.teach. I have imported a pattern file with a few patterns that might correspond to the label that I want.

Since I have a few patterns, these patterns may occur simultaneously in one text-file. However, my intention is that I would simply accept the classification as long as it fulfills one guideline (regardless if other patterns gives negative indication). In this aspect, I refer to @ines previous reply on another thread:
https://support.prodi.gy/t/textcat-teach-presents-same-annotation-task-if-text-snippet-contains-multiple-patterns/1210/2

My concern lies in whether the textcat.teach recipe considers the highlighted tokens/spans in making a determining a likelihood of whether the textfile gets classified to the label. For eg. does the recipe consider the neighbouring words of the highlighted tokens/spans? Or other related issues.

Or I could simply just accept/reject the label, without concerning the highlighted text?

Thanks!

honnibal · July 18, 2019, 1:37pm

The recipe maintains two sets of weights: one for the statistical model, and then a simple likelihood parameter associated with each numeric pattern. So if some pattern has more rejects than accepts, the recipe will learn to stop asking about it.

When you actually train the text classifier on the annotations, the highlighted spans won’t influence the model. The model will be learning a representation informed by all words in the text — the highlighted span doesn’t receive any special treatment.

Topic		Replies	Views
textcat.teach presents same annotation task if text snippet contains multiple patterns enhancement , usage , textcat , solved	11	1668	June 3, 2019
textcat.teach repeatedly annotating the same text, not annotating entire text at once usage , textcat	1	623	November 22, 2019
Same task presented for every pattern match enhancement , textcat	1	559	November 30, 2019
Textcat.teach not using the pattern file enhancement , textcat , done	10	1917	September 20, 2022
Bootstrapping using rule-based matching - handling conflicting patterns within single text usage , textcat	4	572	November 1, 2019

Textcat highlights pattern

Related topics