I'm new to NLP but just read this article about Google's Switch Transformer.
I wonder if these tools have any relevance with SpaCy/Prodigy or if they are exclusive of each other.
It's always an interesting task to map the press releases back to more practical things
It's definitely true that transformer models are an exciting development for NLP, and they have many practical uses --- including uses for spaCy and Prodigy. However, I consider developments like the Google Switch Transformer to be mostly of interest to researchers at this point. It's incremental work along well-established dimensions, that doesn't yet produce better accuracy than other techniques. If the same work were done by grad students at Cornell, we wouldn't be reading about it. I don't try to keep up-to-the-minute with papers like this, because I want to wait a little bit to let things stabilise to see what's worth adopting.
We support transformers in spaCy v3, which lets you get better accuracy on most problems. We've currently got a lot of people beta testing the new Prodigy nightly, which supports spaCy v3. This usage of transformers is basically a neural network detail that shouldn't really change your mental model of how to solve problems. The only thing you need to know is that transformers can be jump-started from raw text, and these jump-started checkpoints are easy to download and plug in. You then need fewer labelled examples and can get higher accuracy, but you need a GPU for training.
I'm also very interested in exploring text generation from models like GPT-2 and GPT-3 as part of Prodigy, to assist with data augmentation and labelling. We expect to write more about that this year, we have to do some experimentation.
ah I see, makes sense. Thanks for the overview!