The task is classify some chat messages. My question is how to set the annotation scheme. I can think of 2 couple of options -
Just annotate each chat message. The problem is that we may lose the context. E.g. the the current chat message could be an answer to the previous message.
Annotate a chat message with n messages before and n messages after. The problem with such approach is that that we only include context which is irrelevant to the current chat message. The context switches more often in a chat then in an email or a document.
Is there a preferred way between the option 1 and option 2. Is there any other options?
Hi! I'm not sure there's an easy answer for this, because it really depends on your data and the trade-offs, and of course how you're planning on training your model later on. If you're taking the context into account when making a prediction, it obviously makes sense to show the context during annotation as well. If you're only making predictions based on single messages, it could be a reasonable experiment to just annotate single messages at a time. This could also help surface potential problems – if it turns out that single messages don't contain enough relevant text to assign a label, your model will quite possibly struggle with this as well, and you can adjust your strategy accordingly.
Thank you. Yes, that's what we are thinking of - trying different strategies to see which one works.
Starts with text with context first. If the model fails to learn, then we will try annotation at the sentence level. Users told us that they annotated at the sentence level before. It was hard to make a decision at the sentence level. This time we will let them annotate text with more context.
Is it possible to have an interface where you can annotate multiple texts at once, i.e. you can choose a label for each utterance in a chat interaction?
While you could probably put something similar together with a custom interface and blocks, we usually recommend focusing on one annotation decision per example, instead of asking your annotator to make multiple decisions and process a complex UI for every example.
If you want to display the context, one option would be to show the whole message for each example and highlight the utterance you're asking about (you could even re-purpose the NER JSON format, for that, this will give you highighting out-of-the-box). You can then render it with the choice UI and include the classification labels. This lets you ask about one utterance at a time and move through them quickly, while still providing the context for reference. The underlying data you load in could look like this: