Bad precision good recall with imbalanced data

I trained a binary textcat model. It has a reasonable performance, 0.8 F1 score. However when I used this model to predict a unseen imbalance test data set (5 pos, 250 neg), I have 1.0 recall and 0.1 precision.

The data set I used is the imdb data, paragraphs. I thought the low precision in test was caused by not enough of train data. I trained with 500 pos and 500 eng first. Then I trained with ~25000 neg and ~25000 pos. The model's performance got better. However the recall for this imbalanced test data set is always around 0.1.

Is there anyway I can improve the precision in test? I'd like to see a high precision model.

Thank you very much.

I think you'll probably be better off exporting the annotations from Prodigy and training with a different toolkit, for instance scikit-learn might work better for your situation.

The built-in commands within Prodigy to train models are mostly a convenience to make the annotate/train/debug loop faster, especially during prototype. You can also extend the commands with your own settings by using a custom recipe. Fundamentally though, this doesn't hold up very well as a general-purpose machine learning toolkit. There are always options and situations where it's better to switch over to a more general solution.

One simple way to get what you want is to change the threshold at which you declare some example is classified as some category. By default we set this at 0.5, but you can set the threshold higher in your code. The scores are in doc.cats, so you can just say you need a score of 0.8 before you consider it a positive prediction.