textcat.batch-train versus spacy classificaion example

I labelled ~2600 documents for binary classification. Then I trained the mode using textcat.batch-train with default parameters and achieved ~98% accuracy. I also tried training using the textcat script from spaCy. That yields around 90% - I am just wondering why I see this big difference.

I am using prodigy 1.6.1 and spacy 2.0.18.

Hmm! Hard to say. 98% is very suspicious. Have you checked that the evaluation data is definitely different from the training data? Since the two experiments will involve some data export and conversion, I guess there’s the chance for the experiment to be different.

There are some hyper-parameter values could be getting set differently between the two. But I’d say 98% is likely to be an incorrect result, so I don’t think it’ll be a problem of trying to get the spaCy script to produce the same result as the Prodigy one. I think it’s more likely to be case of trying to find out what’s going wrong with the Prodigy experiment.

I think I disagree since the classification is expected to be a quite easy task actually. This was also evident by looking at the score in textcat.teach.

I have checked that I have only unique documents in my annotated data and I let prodigy split it into training and evaluation. I’d expect the difference to lie in hyperparameters but they look similar to me though. Have you published the script being used in prodigy?

Just in case this got lost in a pile of work, I allow my self to re-ask :wink: The thing is I have an information extraction engine running spacy 2.1 and then I have a classification engine running spacy 2.0 (due to the better training) but I'd like the engines to run on the same machine/session.

Prodigy does include the source for its textcat.batch-train script. Have a look in your installation, in prodigy/recipes/textcat.py. You can also look on the recipes repo, but I think for certainty you probably want to look at the script you’re running.

The main things to look at would be the batch size, dropout rates, and making sure that if you’re using pretrained vectors, you’re using them in both models. You can also look at the cfg file that gets written out into each model’s folder to check whether there are any hyper-parameters that look different.