Annotating dialogues with Prodigy


We're a team annotating chat dialogues and we're seeking some advice finding a nice way to see the conversations in the Prodigy interface. Our main problem is being able to see the turns of the conversation and which chat participant sent each message.

The solution that we came up with is joining all messages in a conversation in a running text adding newlines to separate the messages. In that running text we also included 'start message tags' which are disabled so that they cannot be annotated and that show some info on the chat participant that sent the message displayed in the next line. (I hope this explanation was clear enough)

However, this schema makes some parts of the workflow quite complicated since we have to recalculate Prodigy's spans output (which gives span offsets within the running text) to message-level span offsets that need to be attached to the correct message. Although we have a script that makes these calculations, it would be interesting to simplify this part of the workflow, e.g. being able to have multiple messages within the interface without having to use multiple blocks or joining all messages in a single text. Also, when we tried to add pre-annotated entities with our own NER model, it identifies entities correctly but the labels are misplaced when we open Prodigy and we suspect this might have something to do with the tokens that we have disabled (?).

In the future, we would like to annotate dialogue acts within the conversation (that is 1 intent per sentence) and dialogue structure (linking sentences to reconstruct the dialogue flow). The entities + relations recipe could be useful for this task, but these dialogues are not very easy to work with in the current recipe. Newline breaks would help, but they do not seem to work in our texts when we wrap them (we're using version 1.10.5).

Any suggestions on how we could improve the way in which we work with these chat dialogues would be very much appreciated. Also we would love to know of any upcoming features which could help in our workflow and dialogue annotation in general.


Hi! If your messages are all separate and your goal is to display multiple of them together, I do think those are the best options – at least, I couldn't think of an alternative way to present the content with the existing or custom interfaces.

If you want to avoid recalculating the offsets, it might be simpler to use plain text / HTML blocks for the text shown before and after. You can format those however you like and you could use an html_template to reference specific fields in the data. So your input could be {"before": "...", "after": "...", "text": "..."}.

The disabled tokens shouldn't affect the rendered entity spans. Maybe it's an off-by-one error? The token_end index of the span should be inclusive, and maybe also double-check that the spans are provided in the order they occur in the text.

Prodigy v1.10.6 supports rendering newlines in the relations UI: newlines in relations annotation - #17 by ines

Also, on the topic of annotating and linking sentences: a similar use case came up in another thread a while ago and I suggested an experimental approach of using symbols/numbers to represent sentences or sentence fragments. So instead of connecting very long blocks with very little visual gain, you'd be looking at the full dialog and then annotating relationships in an abstract representation of it. Using relations interface for large texts - #4 by ines

It's possible that this is less relevant for your use case if your texts are fairly short. But it could still be worth experimenting with.

Thank you so much for your thorough reply, Ines :blush:

I'll bring your suggestions to the team and we'll experiment with them. We'll also take a better look at the issue with the span tokens to identify the real problem.

Thanks again!

1 Like