Annotation for target-based sentiment

I am new to prodigy and am trying to figure out the best way to approach annotation for target-based sentiment classification. Using some set of annotations, I want to predict the spans that are the target of specific types of sentiment. Importantly, a particular span may be the target of multiple sentiment types.

For example, imagine that the specified sentiment types are (positive sentiment, intent to purchase). Given a document like, "I love apples, I think I will buy some at the store", I would want to tag "apples" with both positive sentiment and intent to purchase.

At least for annotation, I was thinking that I could use Prodigy's NER annotation recipe; however, I am not sure whether overlapping labels are permitted.

Any insight or suggestions would be appreciated!

Text classification is more what you are looking for. An intent to purchase is not a named entity.

You can check the example on training insult classifier. You would do the same for "intent to purchase", and again the same for "postitive versus negative".

Apple or pear, or tomatoes, that could be extracted by NER I think, by training a fruit NER or simply further training the PRODUCT NER label.

I agree with @etlweather: you're probably best off tagging the sentence-level information, and then having a separate process to figure out what would be getting purchased, if that label is in the sentence.

You could probably do well with a rule-based system for that initially, possibly using the dependency parse. It probably wouldn't cover the example sentence you gave well, but you could use rules to figure out many common cases.

Thanks for the input!

FWIW, for this task I am also planning on implementing a sentence level classifier, either during a previous stage in a sequential pipeline or perhaps in a multi-task model.

Assuming the former (e.g. that I have a previous model in the loop that classifies sentence-level sentiment), I'll need to detect the target(s) of the sentiment(s) associated with an input doc. I've considered rule based approaches, but it seems like modern supervised approaches perform much better (and I think that will definitely be the case with my data).

So, I'm still wondering what the suggested approach would be for annotating spans with multiple labels.

If prodigy can't handle this, I suppose I could run a separate annotation project for each type of sentiment and then merge the labels myself?


Can you have one task where you do the span-based annotation, and mark a generic category TARGET, and then have a multilabel text classification problem to decide the categories?

Otherwise, you'd have to do the annotation in two passes (one per label). But you wouldn't be able to train only one NER model on the data using the default model --- you'd have to either implement your own that could handle multiple labels per span, or you could have separate NER models, and put them both in the same pipeline.