Do I need to use two models?

oneextrafact · April 5, 2019, 2:22pm

Hi! I’m trying to build a text category classifier for JIRA tickets. I found some good advice in Document classification on large articles. and split this up into two separate operations:

Train a binary classifier to separate out text typed by humans from log files, error messages, etc.
Train a binary classifier that determines whether the ‘human’ information output by the first model might be about a product I’m interested in.

Right now I have this as two separate models, with the output from the first being passed through the second as part of the prediction workflow. Is this the right approach, or is there some better way I can do this?

honnibal · April 8, 2019, 10:47am

I think that sounds like a reasonable way to structure it, especially for annotation efficiency. Hopefully you could also have some rules in stage 1 to filter out some of the machine-generated log lines.

Ultimately it’s experimental though – you could at some point try training a single model with two to four classes. The combined model might be just as accurate, while being easier to deploy and debug. On the other hand, if efficiency is a concern, then maybe you can make model 1 run quickly and only apply a more expensive model 2 on the subset of lines it selects. For annotation though, I do think your process sounds good.

Topic		Replies	Views
Multiple, separate text classifications in a single model usage , textcat , solved	12	2887	September 28, 2021
noisy data usage , ner , textcat	2	365	April 8, 2020
A model with multiple labels or multiple models with a single label? usage , textcat , solved	2	454	February 10, 2020
Merging two classification models usage , textcat	1	452	December 9, 2019
Document classification on large articles. usage , textcat	3	724	March 6, 2019

Do I need to use two models?

Related topics