textcat.teach repeatedly annotating the same text, not annotating entire text at once

ines · November 22, 2019, 1:52am

Hi! The highlighted text is the matched pattern that was used to select that example. (When I recorded my video, Prodigy didn't yet highlight the pattern that was actually matched, which people found a bit confusing. The recipe now does that to make it more transparent that the example was selected based on a specific match in the text). You're still annotating the text plus label, and when you train your model, you'll be training on the text plus label, too. The highlight is just there so you know what the suggestion is based on.

The pattern matcher currently just yields out every match, so if multiple matches occur in the same text, you see each example once. We do want to change this for the next update that allows us to break backwards-compatibility. In the meantime, you can find more details and code for a filter function in this thread: textcat.teach presents same annotation task if text snippet contains multiple patterns - #2 by ines

That's interesting, because I always feel like writing abstract patterns is actually much more useful for text classification than it is for NER. For entities, you often have a pretty specific idea of what the spans should be, so the main token attributes you'd probably want to use are the token text and maybe the lowercase form (to make them case-insensitive). But if you're assigning labels to the whole text, the "trigger words" or phrases are often much more vague and can be stuff like "word with the lemma sell" or "this noun with optional adjective X, Y or Z". That's where token-based patterns make a lot more sense than just more or less exact string matches. But I guess it really depends on the use case.

Topic		Replies	Views
Seeds not recognized by textcat.teach textcat , solved	10	3278	January 23, 2019
Pattern files for textcat.teach usage , textcat	20	3749	July 6, 2018
Textcat.teach not using the pattern file enhancement , textcat , done	10	1917	September 20, 2022
No tasks available in v1.10 - texcat.teach usage , textcat	4	839	June 28, 2020
Textcat.teach seemed to stop displaying seed matches after the first few?	5	304	June 1, 2022

textcat.teach repeatedly annotating the same text, not annotating entire text at once

Related topics