I am currently using Prodigy to classify short texts. However, I would like to classify these short texts sentence by sentence. Instead of viewing and classifying each sentence separately, I was wondering whether it is possible to view text snippet by text snippet and then highlight the positive sentences in that current text snippet (in the same way that you can highlight tokens when you are training a NER model)?
Hi! There are several ways you could solve this and I think it's definitely a good idea to focus on making each sentence a selectable unit (and not highlighting each sentence token-by-token, which would be pretty inefficient).
One simple approach would be to use the choice
interface with "choice_style": "multiple"
and create one option per sentence, maybe grouped by paragraph or some other sensible unit. Then you can go through them and select all positive sentences, either by clicking on them, or using keyboard shortcuts.
Alternatively, a similar use case was posted on the forum a while ago and they ended up using the manual span interface, but feeding in data with a "tokens"
property, but with one sentence as a "token". This would let you view the sentences in their natural flow, and you could double-click on them to select them, and even assign them different labels. See here for details:
Hi Ines!
Thank you for your quick response. By formatting the data in such a way that each sentence was a token (so the second option) it gave the desirable behavior!
However, the lay-out is now quite odd of the marked sentences. I read the other post you referred to, but couldn't quite figure out how to adjust the shape of the highlighted areas. Could you possibly help me with this as well?
Thanks in advance!
Glad it worked! And try setting display: block
on the sentence "tokens", as described here to make the selection a block element: