labeling coreference task with 1-4 corefs per ~200-300 words paragraphs

omadios · February 10, 2022, 2:07pm

Hello, I have used the "live demo app" to see how the labeling works for a coreference task and I have a question about our use case before purchasing Prodigy.

In our case, the first pass is a simple NER task with 2 classes. After that, we need to extract 1-4 (maximum) coreferences from each medical report (each report is 100-300 words).

I'd like to know how does the GUI work in the case of a longer text compared to what is shown in the example (scrolling down a "long" text ,wrapping it, seems a bit unpractical: the corefs can be far away within those 300 words).

Alternatively, would it be easier to have a maximum of 8 pairs of labels?

I appreciate that the actual ML task will be challenging down the line, but given that the data will have to be labelled anyway, I thought to give it a try and I want to understand if Prodigy can make this manual labeling task easy.

Thank you,

Marcello

ines · February 11, 2022, 12:33pm

Hi! Since you already know the potential coref candidates upfront, I think this should definitely be feasible with the long texts because you can take advantage of the named entities and possibly POS tags and use them to disable all other tokens and limit the tokens that can be selected. We're doing something similar in the built-in coref workflow: https://prodi.gy/docs/dependencies-relations#coref (Alternatively, you could also use the rel.manual with custom --disable-patterns.)

You can also use the theme settings to make the default annotation card wider and fit more text in a row: https://prodi.gy/docs/api-web-app#theme-sizes There's obviously still some text to read but if it's clear what can be connected and what can't, I think this can still be very efficient from an annotation perspective.

Topic		Replies	Views
Annotating dependecies for very long sentences usage , relations	7	1413	March 19, 2021
Annotating coreference on NER annotated text usage , ner , coref	3	236	May 13, 2024
Best Practices for Segmenting Text into Passages and Applying Multi-label Classification	1	794	September 13, 2023
NER and Coref/Rel advice usage , relations , coref	4	757	December 27, 2022
Dynamic choices for binary long-range coreference usage , custom , coref	2	650	December 22, 2021

labeling coreference task with 1-4 corefs per ~200-300 words paragraphs

Related topics