Will using transformers improve textcat_multilabel score/accuracy?

We are currently training on a CPU and we are getting 70% on the textcat score. Is there a guarantee that using transformers will improve our results a lot? We don't want to spend a lot on a GPU if the improvement is not that significant.


hi @joebuckle!

There aren't guarantees transformers will always lead to better performance for GPU. I would say on average, yes, we would expect them to do better but at what cost?

The only likely guarantee is that productionalizing your model will be harder (e.g., longer time to score, more memory). Vincent mentioned this previously:

The spaCy team has done some benchmarks using transformers in spaCy based on speed and it's clear transformers slow end-to-end processing speed.

But there are other steps when using GPU/transformers. Here's a great FAQ of using GPUs in spaCy and the setup of CUDA drivers, etc.. You can also view many spaCy issues mentioning GPU's that include problems like out-of-memory issues.

Another problem with using transformers is properly accounting for sub-word tokenization. If you have experience it may not be a huge problem and there's Prodigy docs on how to handle this. However, you likely already have limited time and it's one additional factor that could add more complexity to your process.

Now with that being said, if you have a lot of experience, there's definitely a lot of potential to increase accuracy with transformers. For example, two of our teammates have been doing testing on optimizing speed for transformers and wrote a great blog on recent testing with Apple GPUs:

But if you're just starting out, I think it could be more cost without guaranteed performance gains.

In summary, I go back to Matt's ML Hierarchy of Needs:

That is, sometimes how to frame your problem (e.g., your annotation schemes for entities) is much more important than optimizing your architecture and/or hyperparameters.

Personally, I prefer to start with a good way to frame the problem and get the model to production as it's never a trivial task. But after getting a good production workflow and evaluating the model over some time, you could then want to create a champion vs. challenger setup where you can test transformers as challenger models run. At that point, it may be a better time to evaluate transformers.

Thank you! :slight_smile: