Can't improve textcat model performance

oneextrafact · May 2, 2020, 11:40pm

I have a text classification model that I've trained on sentences. The model is binary, with the label either present or not. I've got about 1750 annotated sentences in two datasets, although a lot of those are 'ignores' and only 722 are used in training. I can't seem to get the accuracy of the model past 0.75; train-test curve shows accuracy dropping in the last stage. Is there anything else I should be looking at to improve the performance of the model? Loss is pretty close to 0 at the conclusion of 10 iterations of training.

ines · May 3, 2020, 11:29am

Hi! It's always difficult to give a definitive answer because there could be many explanations. But here are a few questions and ideas to help you debug:

Which versions of Prodigy and spaCy are you running?
Are you using a dedicated evaluation set, or are you using the default evaluation split that just holds back a certain percentage? If you don't have a separate evaluation set, it's possible that your evaluation is unstable because you're always taking a random 20% of the already small set of 722 training examples. Maybe this random split just happens to give you an unideal selection of the data.
If your loss hits 0 after the 10 iterations, this indicates that your model might be overfitting on the data.
How were your annotations created, and are they internally consistent? Is the distinction you're trying to train something the model can learn?

oneextrafact · May 3, 2020, 12:29pm

Thanks for the reply!

Using spacy 2.2.4 and prodigy 1.9.9
I have a dedicated evaluation set
I created the annotations myself, based on a self-written spec, so this is the part that most worries me. If I set the threshold to a high level, the model identifies things pretty reliably but the recall at that point is pretty poor, it would be letting lots of things get by.

Topic		Replies	Views
Text Classifier annotations usage , textcat , energy	3	1168	March 11, 2018
Evaluating a text classification model usage , textcat	4	798	September 24, 2019
Prodigy textcat train optimization?? usage , textcat , spacy	3	543	March 23, 2020
Help needed to get started with text classification usage , textcat	10	3520	January 14, 2019
textcat.batch-train versus spacy classificaion example usage , textcat , spacy	4	545	March 30, 2019

Can't improve textcat model performance

Related topics