Thank you for the helpful feedback Vincent! Here's some follow-up clarification and new questions 
My high level goal is to be able to track customer comments on a regular time interval to show trends in the class counts. So maybe we see in Q1 that 40% of comments are referring to 'Subscriptions' and that 90% of those are negative sentiment (BAD SUBSCRIPTION counts). In a dashboard we might be able to model the impact of underlying changes to the subscription experience and see the counts for BAD subscription reverse course to GOOD subscription say in Q2. I'm assuming that the class categories will be fairly constant at least over the coming year or two.
Your question:
Is it possible that a document is about two of these classes at the same time? I can certainly imagine that a subscription experience could happen on a mobile app, just like an issue with an ink cartridge might affect the print quality.
This is true. If we get 5000 comments then it could be more than half refer to subscriptions and have bad sentiment. But within the same comment a customer also mentions printer setup was easy and support was good. That's why I thought spanCAT would be the right approach - to tease out spans referring to more than one class in the same comment.
Your comment:
What about neutral? There's a lot of gray area between positive and negative. You could choose to model the "sentiment" as a spectrum where 0 means negative, 1 means positive and 0.5 represents the halfway point. But you could also model "positive" as a separate class and "negative" as another one. I'm mentioning this because in some applications you're less interested in "gray" cases and only care about the obvious positive/negative ones.
I want to be able to tag the sentiment to the specific span, not the whole document to allow for comments that have something nice to say about one thing and are angry about another.
What's interesting is that in addition to the comment, the customer has given their NPS score which if you're familiar ends up being a useful sentiment class label where they fall into PROMOTER, PASSIVE, DETRACTOR by their own submission. Interestingly, promoters comments will be generally all positive and detractors will be mostly negative but passives have more conflicting sentiments and transformer based models I've applied have less than .5 confidence predicting those comments. I'm thinking the spanCAT will help tease that out.
Your comment:
You can totally opt to have seperate datasets for seperate labels if you choose to go for non-exclusive classes. The Prodigy train
recipe can then look at different datasets as it's training models too. The nice thing about this setup is that it's fairly easy to add a new label of interest and it also keeps things saperate. My gut feeling is that this approach might work very well for you.
This is really helpful but I'm trying to form a mental model for how the model works with a series of datasets to make a classification prediction! As I mentioned, I'm skipping comments that don't have mentions of subscription in them on the first dataset run. If I start a new run with labels for another class then I'll have comments annotated for that class and skip comments. After 6 runs and 6 separate datasets, do I end up with a model that integrates results for a comment across classes?
Your comment:
Alternatively, it might also make sense to have a single interface with all the labels if you're pretty sure that the labels won't change over time. Eventually a custom recipe might make sense here. There's a nice example of this on Youtube here where I combine spancat with choice interfaces.
Thanks for this video - I'm thinking that having all the labels present from the start might be the best. Based on your thoughts on the separate annotation runs and the final model output I'll take a look at this more.
Thank you Vincent - appreciate your input and learning so much watching our videos 