Text Categorization at Document level

Hi - I’ve got a bunch of support tickets that I want to categorize; specifically, I want the model to learn a label that would apply to the whole ticket / document. The tickets are of a variety of lengths, some of them extending to multiple paragraphs of text. For annotation, should I still be be trying to annotate at the level of individual sentences, or should I move up to paragraphs? To the whole document?

I think sentences or paragraphs are probably a good granularity to annotate at. If you’re going to be reading the text, you may as well click the button to apply an annotation at the sentence level — it doesn’t take any extra time, really, and it gives you more detailed annotations to learn from.

Honnibal, when you say, “click … to apply an annotation at the sentence level” do you mean using the textcat.teach with the -L/–long-text classification mode?

@timothyjlaurent, thanks for the additional comment! I hadn’t even noticed the long text classification mode. @honnibal - related (newbie) question. I have a set of 67 documents that have the classification I want to learn, but of course not every paragraph / sentence drives that classification. For training, can I use the same set of documents for both positive and negative examples, or should I plan to include another set of documents that I know don’t have the classification?

Thanks!