How to combine results of simpler annotation runs together using spanCAT

I'm trying to figure out how to approach annotating customer comments and appreciate you pointing me in the right direction. My goal is to classify each comment with multiple classes if they are reflected in the comment and I believe spanCAT is the right recipe.
Roughly there are six high level classes each comment could contain - printer setup, using the mobile app, subscription experience, ink cartridges, print quality and customer support. Within each of those there will be sub-classes but for now I'd be happy to just be able to classify if the comment was positive or negative within each of those categories.

I started with one simple task as you recommend and annotated around 500 comments using spans.manual and span.correct using GOOD SUBSCRIPTION, BAD SUBSCRIPTION as my labels.

A couple areas I would like advice: 1) As I was going through the comments, if a comment didn't include feedback on the subscription I skipped it. I'm thinking that I would have a chance to come back and annotate that when I added other labels in the spirit of not trying to do it all at once. Is this a good approach? If so my second question is: How do I go back through the comments and add some of the other labels - do I simply run spanCAT again on the same dataset and add new labels and do I have to include the previous labels or not each run through - hopefully this question makes sense.

Appreciate the feedback and any corrections to thinking or approach :slight_smile:

Some ideas that popped into my mind as I read this.

Roughly there are six high level classes each comment could contain - printer setup, using the mobile app, subscription experience, ink cartridges, print quality and customer support.

Is it possible that a document is about two of these classes at the same time? I can certainly imagine that a subscription experience could happen on a mobile app, just like an issue with an ink cartridge might affect the print quality.

Within each of those there will be sub-classes but for now I'd be happy to just be able to classify if the comment was positive or negative within each of those categories.

What about neutral? There's a lot of gray area between positive and negative. You could choose to model the "sentiment" as a spectrum where 0 means negative, 1 means positive and 0.5 represents the halfway point. But you could also model "positive" as a separate class and "negative" as another one. I'm mentioning this because in some applications you're less interested in "gray" cases and only care about the obvious positive/negative ones.

I started with one simple task as you recommend and annotated around 500 comments using spans.manual and span.correct using GOOD SUBSCRIPTION, BAD SUBSCRIPTION as my labels.

Spancat could work, but I'm curious ... is there a reason why you didn't go for text classification? I can imagine that support tickets come in all sorts of shapes and sizes ... and you may care less about "where" in the text a positive sentence appears if you can already demonstrate that the overall sentiment of the text is positive. This depends a bit on the application though, so feel free to take this thought with a grain of salt.

How do I go back through the comments and add some of the other labels - do I simply run spanCAT again on the same dataset and add new labels and do I have to include the previous labels or not each run through - hopefully this question makes sense.

There are a few ways to go about something like this. Here's some ideas that might help:

  • You can totally opt to have seperate datasets for seperate labels if you choose to go for non-exclusive classes. The Prodigy train recipe can then look at different datasets as it's training models too. The nice thing about this setup is that it's fairly easy to add a new label of interest and it also keeps things saperate. My gut feeling is that this approach might work very well for you.
  • Alternatively, it might also make sense to have a single interface with all the labels if you're pretty sure that the labels won't change over time. Eventually a custom recipe might make sense here. There's a nice example of this on Youtube here where I combine spancat with choice interfaces.

Let me know if my feedback prompts any extra follow-up questions!

1 Like

Thank you for the helpful feedback Vincent! Here's some follow-up clarification and new questions :slight_smile:
My high level goal is to be able to track customer comments on a regular time interval to show trends in the class counts. So maybe we see in Q1 that 40% of comments are referring to 'Subscriptions' and that 90% of those are negative sentiment (BAD SUBSCRIPTION counts). In a dashboard we might be able to model the impact of underlying changes to the subscription experience and see the counts for BAD subscription reverse course to GOOD subscription say in Q2. I'm assuming that the class categories will be fairly constant at least over the coming year or two.

Your question:
Is it possible that a document is about two of these classes at the same time? I can certainly imagine that a subscription experience could happen on a mobile app, just like an issue with an ink cartridge might affect the print quality.

This is true. If we get 5000 comments then it could be more than half refer to subscriptions and have bad sentiment. But within the same comment a customer also mentions printer setup was easy and support was good. That's why I thought spanCAT would be the right approach - to tease out spans referring to more than one class in the same comment.

Your comment:
What about neutral? There's a lot of gray area between positive and negative. You could choose to model the "sentiment" as a spectrum where 0 means negative, 1 means positive and 0.5 represents the halfway point. But you could also model "positive" as a separate class and "negative" as another one. I'm mentioning this because in some applications you're less interested in "gray" cases and only care about the obvious positive/negative ones.

I want to be able to tag the sentiment to the specific span, not the whole document to allow for comments that have something nice to say about one thing and are angry about another.

What's interesting is that in addition to the comment, the customer has given their NPS score which if you're familiar ends up being a useful sentiment class label where they fall into PROMOTER, PASSIVE, DETRACTOR by their own submission. Interestingly, promoters comments will be generally all positive and detractors will be mostly negative but passives have more conflicting sentiments and transformer based models I've applied have less than .5 confidence predicting those comments. I'm thinking the spanCAT will help tease that out.

Your comment:
You can totally opt to have seperate datasets for seperate labels if you choose to go for non-exclusive classes. The Prodigy train recipe can then look at different datasets as it's training models too. The nice thing about this setup is that it's fairly easy to add a new label of interest and it also keeps things saperate. My gut feeling is that this approach might work very well for you.

This is really helpful but I'm trying to form a mental model for how the model works with a series of datasets to make a classification prediction! As I mentioned, I'm skipping comments that don't have mentions of subscription in them on the first dataset run. If I start a new run with labels for another class then I'll have comments annotated for that class and skip comments. After 6 runs and 6 separate datasets, do I end up with a model that integrates results for a comment across classes?

Your comment:
Alternatively, it might also make sense to have a single interface with all the labels if you're pretty sure that the labels won't change over time. Eventually a custom recipe might make sense here. There's a nice example of this on Youtube here where I combine spancat with choice interfaces.

Thanks for this video - I'm thinking that having all the labels present from the start might be the best. Based on your thoughts on the separate annotation runs and the final model output I'll take a look at this more.

Thank you Vincent - appreciate your input and learning so much watching our videos :slight_smile: