Model explanation

Model explanation is a hot topic now and business users want it. If Prodigy can highlight which words/phrases carry more weight, human should be able to annotate even faster and more accurately.

I've seen some discussion on having model returns the weight for each token. I wonder whether Prodigy is thinking of introducing a model explanation out of the box. Or is there way I can implement this function by myself? Thanks.

I think this would be a great addition to Prodigy. I have implemented this outside of Prodigy and it really helps to understand what the model has actually learned.

My implementation is for a RoBERTa text classification model and build on top of PyTorch's Captum using GradientShap.


Yes, that's a cool idea! Prodigy should already have all the building blocks for that – you just need to implement the process that weights for the tokens/subtokens or whatever else you want to interpret.

This thread is slightly older, but it has some custom recipes and ideas for visualizing attention during text classification annotation:

We'd love to have this more easily accessible for spaCy models! But otherwise, it really depens on your model, the framework you're using (both for ML and interpretability) and what you're trying to do. There's no out-of-the-box answer for that. But Prodigy should provide the building blocks you need to incorporate model interpretability into your annotation workflow.

This looks great! :100: And this seems to already return formatted HTML, right? So I guess you could stream that in using the html interface, or add it as a separate block?

(Also, a small detail that I want to add to the regular ner/ spans interface: If individual spans can take a "color" value, you could easily implement the same visualization just with character offsets and different colour shades depending on the score, without having to assign distinct labels and label colours.)

Great. I'll try both solutions and update the result here. Thanks!