Format neuralcoref inferences for use with prodigy relations recipe

BenGriffithsPEP · October 24, 2022, 3:51pm

Hi. I'm trying to create a custom co-reference model and I'd like to use neuralcoref to give me a headstart manually annotating my data with co-reference relationships. Is there an easy way to format neuralcoref inferences for prodigy rel recipes? Thanks!

ryanwesslen · October 24, 2022, 4:52pm

hi @BenGriffithsPEP!

Thanks for your message and welcome to the Prodigy community

Unfortunately, we don't have an off-the-shelf "correct" recipe for neuralcoref. As the ticket below mentions, what would be important is to understand the format required (which I think is part of your question):

Once you understand their format, you could create a custom recipe. And as you found, it may be helpful to look over the docs for the relations recipes.

Also, you may want to look at the dep.correct recipe to see how a "correct" recipe with the "relations" user interface (note that link shows what is the format for the "relations" UI which may be helpful too). To view this recipe, run:

python -m prodigy stats

Where you should then see the Location: where Prodigy has been installed. In that location folder, look for the file recipes/dep.py where you'll see the dep.correct recipe.

Similarly, you may find the coref.manual recipe to be helpful too, which you can find in the recipes/coref.py. Essentially, you'd want to combine the idea of "correcting" the neuralcoref model into the coref.manual recipe.

If you're able to get a recipe, feel free to post back and/or post it as a GitHub gist! We would greatly appreciate it. Hope this helps!

BenGriffithsPEP · October 25, 2022, 3:32pm

Hey @ryanwesslen thanks for getting back to me and answering my question. If I do end up creating a custom recipe I will 100% share it here.

I have a few additional follow up questions. When using either the coref or rel recipe to annotate coreferences, I'm a little confused which direction the relationship should go. I read somewhere in the prodigy documentation that it's not so important with coref because they are simply pairs of references, but wouldn't this have an impact when it comes to training a model? Also, how should you deal with pronouns that link to multiples entities e.g. "The doctor and nurse saw the patient. They did a great job." Should you assign a reference from 'doctor' to 'they' and 'nurse' to 'they'. In this case isn't the head / child of the relationship important? I'm not sure which should be the head and which should be the child?

ryanwesslen · October 26, 2022, 5:20pm

It's a subtle point, but the scoring - including the loss calculation - only cares about whether two mentions are in the same cluster, it doesn't have a notion of direction in relationships. You can think of this as undirected graph.

For more details, I would highly recommend our recent deep dive post on Neural Coreference Resolution in spaCy:

You would link "the doctor and the nurse" as one mention to "they". But in this case, this means that "the nurse" cannot be linked separately (it can in data but the model can't find it). This is the "split heads" referenced in the blog post.

More precisely the problem is treated as a clustering problem over non-overlapping spans in a document. The non-overlapping constraint renders the system incapable of handling the "split antecedent" problem. For example in "Alice and Bob said they like cheese, but he prefers sushi." The pronoun "they" refers to "Alice and Bob" and "he" refers to "Bob". However, the span "Bob" is inside "Alice and Bob" so we have to choose to either resolve "they" to "Alice and Bob" or "he" to "Bob". The lack of split antecedent handling is a limitation of many coreference resolutions systems including ours.

As the post outlines, the spaCy core team's work on coref component is still experimental but likely could be very helpful for you. Since it's experimental, we haven't fully integrated it into Prodigy yet but there's a lot of opportunity with a custom recipe. If you have more questions, I would suggest posting on the spaCy discussions forum as that's where the spaCy core team answers spaCy specific questions (this forum is for Prodigy-specific questions).

BenGriffithsPEP · October 27, 2022, 1:34pm

Thanks for this information, it's really helpful!

ryanwesslen · October 27, 2022, 1:40pm

Also, be on the look out very soon for an accompanying spaCy coref video tutorial (with code too!).

I just saw a sneak preview and it's an excellent summary of the post!

ryanwesslen · November 2, 2022, 3:26pm

Just released the new experimental coref video by Edward and team:

Here's the GitHub code too.

Enjoy!

Topic		Replies	Views
Data format of coref annotations Getting Started usage , solved , coref	5	917	June 28, 2022
Relation extraction model not showing in Prodigy spacy , relations	6	423	April 22, 2022
Annotating coreference on NER annotated text usage , ner , coref	3	242	May 13, 2024
How to convert the rel.manual jsonl annoatated output or export relations annotations for training the neuralcoref model relations , coref	1	439	June 22, 2022
NER and Coref/Rel advice usage , relations , coref	4	760	December 27, 2022

Format neuralcoref inferences for use with prodigy relations recipe

Related topics