I'm looking for an annotation tool for coreference resolution. Ideally the workflow would be to obtain mention candidates from a model, manually correct the mention labels, and after that perform coreference annotation, both for entities and events. Also there might be overlapping and nested mentions.
Below is a typical example for entity coreference resolution. Note that in the span 'her pay', both 'her' and 'her pay' can be seen as separate mentions.
I see you already have a coref.manual recipe. The issue I ran into in the demo is that the spans for mentions provided by the model can't be modified. I saw that you do provide this capability through rel.manual. This option could work but I only saw the --add-ners and --add-nps tags options, so I'm missing the POS tags that you provide in the coref.manual recipe.
Is it possible to manually create a recipe that has similar tags to that of coref.manual but with the --span-label capability? Also, I saw in the Prodigy nightly thread that you now provide overlapping spans capability. Is this something that could be added to the workflow outlined above?
Yes, the idea behind this workflow is that it assumes you're using the part-of-speech tagger's and/or named entity recognizer's output as features or for candidate selection in coreference model. So being able to edit those pre-defined annotations would be very misleading – you'd be annotating data for a state that you would never actually get to at runtime.
The rel.manual recipe lets you feed in data that contains pre-annotated "spans" – either set by your own process, or created in a separate annotation process (e.g. ner.manual).
Alternatively, you can also provide --patterns that define the tokens and spans to label. For example, one or more tokens tagged as proper noun:
In fact, this is also how the coref.manual recipe does it under the hood: it calls into rel.manual and provides some custom patterns to select the candidate spans. You can check out the implementation by looking at recipes/coref.py in your Prodigy installation (run prodigy stats to find out the local path).
This would allow you to annotate overlapping spans in the text – the relations UI currently only shows one "layer" of spans, everything else would get really messy and difficult to visualise in a way that's actually helpful. So if you have nested spans and complex relations, it probably makes sense to deal with this separately, or use the relation labels to encode the entity types. I recently posted an example of how this could look for non-contiguous entity spans: