Is there any relevance of tools like GPT-3 and Google's Switch Transformer?

I'm new to NLP but just read this article about Google's Switch Transformer.
I wonder if these tools have any relevance with SpaCy/Prodigy or if they are exclusive of each other.

It's always an interesting task to map the press releases back to more practical things :slightly_smiling_face:

It's definitely true that transformer models are an exciting development for NLP, and they have many practical uses --- including uses for spaCy and Prodigy. However, I consider developments like the Google Switch Transformer to be mostly of interest to researchers at this point. It's incremental work along well-established dimensions, that doesn't yet produce better accuracy than other techniques. If the same work were done by grad students at Cornell, we wouldn't be reading about it. I don't try to keep up-to-the-minute with papers like this, because I want to wait a little bit to let things stabilise to see what's worth adopting.

We support transformers in spaCy v3, which lets you get better accuracy on most problems. We've currently got a lot of people beta testing the new Prodigy nightly, which supports spaCy v3. This usage of transformers is basically a neural network detail that shouldn't really change your mental model of how to solve problems. The only thing you need to know is that transformers can be jump-started from raw text, and these jump-started checkpoints are easy to download and plug in. You then need fewer labelled examples and can get higher accuracy, but you need a GPU for training.

I'm also very interested in exploring text generation from models like GPT-2 and GPT-3 as part of Prodigy, to assist with data augmentation and labelling. We expect to write more about that this year, we have to do some experimentation.

1 Like

ah I see, makes sense. Thanks for the overview!