textcat by sentence given context of larger document

mhigginslp · March 1, 2018, 6:07pm

My dataset is a series of conversations between two people. I would like to categorize each sentence of this conversation but I would like to see the context of the conversation when I perform the annotation.
Do you have any suggestions?

ines · March 1, 2018, 8:12pm

Just to make sure I understood the question correctly: You want the classifier to only see one sentence at a time and label that, but in the annotation interface, you also want to see the content?

One idea could be to use the "html" interface and add a "html" property to your task that contains the full text and highlights the sentence you’re currently labelling. You can then store the original sentence text in the "text". For example:

{
    "html": "Sentence one. <strong>Sentence two.</srong> Sentence three",
    "text": "Sentence two.",
    "label": "Some label"
}

This should be pretty easy to do programmatically. When you annotate the examples with the "html" interface, Prodigy be rendering the HTML – but when you train the classifier later on, Prodigy will use only the "text" and the "label", both of which will be preserved in the dataset.

Alternatively, you could also use a custom HTML template and add the context as separate keys to your task (e.g. one for the prefix and one for the suffix). All task properties will become available as template variables. So a template like this…

<h2>{{label}}</h2>
{{before_text}} <strong>{{text}}</strong> {{after_text}}

… can be populated with data like this:

{
    "text": "Sentence two.",
    "before_text": "Sentence one.",
    "after_text": "Sentence three.",
    "label": "Some label"
}

If you want this to be fancier – and if you can be bothered – you could even add some styling to your template to format it more like a conversation. Even chat bubbles or something! (I’ve always wanted to build an interface like this for Prodigy actually, haha.)

Topic		Replies	Views
Sentence fragments in context for classification labeling task. ner , textcat , front-end	1	436	September 8, 2020
Classify sentences with paragraph visible usage , front-end , solved	3	492	January 30, 2023
How to annotate chat messages for a classification task? usage , textcat , dialog	4	765	March 3, 2021
text classification - is prodigy a good fit for the project? usage , textcat	2	678	October 22, 2019
Topic Modelling with text classification usage , textcat	1	617	November 30, 2020

textcat by sentence given context of larger document

Related topics