Text Categorization at Document level

oneextrafact · January 26, 2019, 10:58pm

Hi - I’ve got a bunch of support tickets that I want to categorize; specifically, I want the model to learn a label that would apply to the whole ticket / document. The tickets are of a variety of lengths, some of them extending to multiple paragraphs of text. For annotation, should I still be be trying to annotate at the level of individual sentences, or should I move up to paragraphs? To the whole document?

honnibal · January 27, 2019, 4:22am

I think sentences or paragraphs are probably a good granularity to annotate at. If you’re going to be reading the text, you may as well click the button to apply an annotation at the sentence level — it doesn’t take any extra time, really, and it gives you more detailed annotations to learn from.

timothyjlaurent · January 29, 2019, 12:51am

Honnibal, when you say, "click ... to apply an annotation at the sentence level" do you mean using the textcat.teach with the -L/--long-text classification mode?

oneextrafact · February 6, 2019, 10:22pm

@timothyjlaurent, thanks for the additional comment! I hadn’t even noticed the long text classification mode. @honnibal - related (newbie) question. I have a set of 67 documents that have the classification I want to learn, but of course not every paragraph / sentence drives that classification. For training, can I use the same set of documents for both positive and negative examples, or should I plan to include another set of documents that I know don’t have the classification?

Thanks!

Topic		Replies	Views
Workflow for sequential sentence classification usage , textcat , custom	6	952	May 15, 2020
Using the NER_manual interface to annotate text classification usage , textcat , front-end	4	412	September 14, 2022
visualize a whole document (corpora) for text classification usage , textcat , solved	2	323	August 30, 2021
Annotate passages in long documents	1	573	July 28, 2022
Text length for spancat model usage , spancat	8	363	April 11, 2023

Text Categorization at Document level

Related topics