How to annotate chat messages for a classification task?

The task is classify some chat messages. My question is how to set the annotation scheme. I can think of 2 couple of options -

  1. Just annotate each chat message. The problem is that we may lose the context. E.g. the the current chat message could be an answer to the previous message.

  2. Annotate a chat message with n messages before and n messages after. The problem with such approach is that that we only include context which is irrelevant to the current chat message. The context switches more often in a chat then in an email or a document.

Is there a preferred way between the option 1 and option 2. Is there any other options?

Thank you.

Hi! I'm not sure there's an easy answer for this, because it really depends on your data and the trade-offs, and of course how you're planning on training your model later on. If you're taking the context into account when making a prediction, it obviously makes sense to show the context during annotation as well. If you're only making predictions based on single messages, it could be a reasonable experiment to just annotate single messages at a time. This could also help surface potential problems – if it turns out that single messages don't contain enough relevant text to assign a label, your model will quite possibly struggle with this as well, and you can adjust your strategy accordingly.

Thank you. Yes, that's what we are thinking of - trying different strategies to see which one works.

Starts with text with context first. If the model fails to learn, then we will try annotation at the sentence level. Users told us that they annotated at the sentence level before. It was hard to make a decision at the sentence level. This time we will let them annotate text with more context.

1 Like