I have a text classification model that I've trained on sentences. The model is binary, with the label either present or not. I've got about 1750 annotated sentences in two datasets, although a lot of those are 'ignores' and only 722 are used in training. I can't seem to get the accuracy of the model past 0.75; train-test curve shows accuracy dropping in the last stage. Is there anything else I should be looking at to improve the performance of the model? Loss is pretty close to 0 at the conclusion of 10 iterations of training.
Hi! It's always difficult to give a definitive answer because there could be many explanations. But here are a few questions and ideas to help you debug:
- Which versions of Prodigy and spaCy are you running?
- Are you using a dedicated evaluation set, or are you using the default evaluation split that just holds back a certain percentage? If you don't have a separate evaluation set, it's possible that your evaluation is unstable because you're always taking a random 20% of the already small set of 722 training examples. Maybe this random split just happens to give you an unideal selection of the data.
- If your loss hits 0 after the 10 iterations, this indicates that your model might be overfitting on the data.
- How were your annotations created, and are they internally consistent? Is the distinction you're trying to train something the model can learn?
Thanks for the reply!
- Using spacy 2.2.4 and prodigy 1.9.9
- I have a dedicated evaluation set
- I created the annotations myself, based on a self-written spec, so this is the part that most worries me. If I set the threshold to a high level, the model identifies things pretty reliably but the recall at that point is pretty poor, it would be letting lots of things get by.