Sentence classification problem

Hi all!
I developed a service using the ‘prodigy train texcat’ recipe with the intent to predict if a sentence can be classified within 9 different classes.
My original dataset consists of 5,000 sentences and now I can reach 15,000 sentences using some data augmentation technique (using the nlpaug library).
The results seem to be good, at least satisfactory for this first version with a TPR ranging from 0.9 to 0.95, and a similar FPR, for all the 9 classes.
The problems arise when I try to distinguish between sentences that I know can be classified into one of these 9 classes, and other generic sentences not being related in any way to the problem I’m studying.
I tried adding another class, the 10th, that I named ‘Alien class’. But TPR and FPR are not comparable with the other ones being very low. The AUC related value is something between 60% and 70% so, in brief, very close to the random guess.
Then I tried with another approach. I created a binary classifier that I can use as a pre-filter in order to pass to the classifier only the sentences I know are related to the problem.
Well, the results are unfortunately similar to the ‘Alien class’.
Any suggestion? Do you have any other approach I can follow to suggest me?

This definitely sounds like the better approach, so I think it makes sense to focus on that. It also gives you more flexibility and lets you focus on this step (which clearly seems to be somehow tricky) in isolation, use different training data if needed etc. How are you sourcing your training examples for the alien class, and what's the distribution? Like, when you trained the binary classifier, how many alien examples vs. non-alien examples did you have?