labeling coreference task with 1-4 corefs per ~200-300 words paragraphs

Hello, I have used the "live demo app" to see how the labeling works for a coreference task and I have a question about our use case before purchasing Prodigy.

In our case, the first pass is a simple NER task with 2 classes. After that, we need to extract 1-4 (maximum) coreferences from each medical report (each report is 100-300 words).

I'd like to know how does the GUI work in the case of a longer text compared to what is shown in the example (scrolling down a "long" text ,wrapping it, seems a bit unpractical: the corefs can be far away within those 300 words).

Alternatively, would it be easier to have a maximum of 8 pairs of labels?

I appreciate that the actual ML task will be challenging down the line, but given that the data will have to be labelled anyway, I thought to give it a try and I want to understand if Prodigy can make this manual labeling task easy.

Thank you,


Hi! Since you already know the potential coref candidates upfront, I think this should definitely be feasible with the long texts because you can take advantage of the named entities and possibly POS tags and use them to disable all other tokens and limit the tokens that can be selected. We're doing something similar in the built-in coref workflow: (Alternatively, you could also use the rel.manual with custom --disable-patterns.)

You can also use the theme settings to make the default annotation card wider and fit more text in a row: There's obviously still some text to read but if it's clear what can be connected and what can't, I think this can still be very efficient from an annotation perspective.