Something is off. How should I got about debugging it. I also looked at the similarity of Repurchased vs Repurchase and it is 0.9999999467980901 which makes sense.
Hi! Which version of spaCy are you running? It can sometimes happen that the number is greater than 1 (floating point imprecision), but if I remember correctly, the similarity methods should have a condition that just makes it return 1.0 in those cases so the result is less confusing.
I am running version 2.2.3 of spacy. I checked the word vector for Repurchased and Repurchase and they are the same. I don't get why the textcat results in such different probabilities for basically the same sentence?
I tried to debug it a little. I notice that the thinc.neural._classes.feature_extracter.FeatureExtracter results in a slightly different array? Only the second element of the array is different after the FeatureExtracter runs in the predict function for textcat. Not sure if that helps with debugging the issue.