Relation extraction model not showing in Prodigy

Hi, we are having trouble with our relation model not showing up on Prodigy. Our NER shows up, but there are no relations shown. Our model was trained using spacy and does print out values when testing in console.

Here is an example:

Should relationships be showing even with low scores? we don't have much data to train with due to time constraints.

Thanks.

What relationship model have you trained? At the moment spaCy offers no built-in support for coreference models and we don't even have a datastructure for it on our Doc objects.

The coref.manual command allows you to specify a spaCy model that will be used to detect parts of speech to help you label. But this model doesn't contain a coreference model to help pre-fill the links.

Looking at the coreference documentation in Prodigy there might be an alternative for your workflow though. You might be able to pre-fill the relationships of interest in your .jsonl file.

Here's an example of such a .jsonl file.

{"text":"The Lemon Drop Kid , a New York City swindler, is illegally touting horses at a Florida racetrack. After several successful hustles, the Kid comes across a beautiful, but gullible, woman intending to bet a lot of money. The Kid convinces her to switch her bet, employing a prefabricated con. Unfortunately for the Kid, the woman \"belongs\" to notorious gangster Moose Moran , as does the money. ","meta":{"source":"CMU Movie Summary Corpus"},"relations":[{"head":3,"child":9,"head_span":{"start":0,"end":18,"token_start":0,"token_end":3,"label":"NP"},"child_span":{"start":21,"end":45,"token_start":5,"token_end":9,"label":"NP"},"color":"#c5bdf4","label":"COREF"},{"head":3,"child":26,"head_span":{"start":0,"end":18,"token_start":0,"token_end":3,"label":"NP"},"child_span":{"start":137,"end":140,"token_start":26,"token_end":26,"label":"ORG"},"color":"#c5bdf4","label":"COREF"}]}

When I place these contents in a pre-filled.jsonl file then I can all the coref.manual recipe via;

prodigy coref.manual coref_movies en_core_web_sm pre-filled.jsonl --label COREF

Which results in a pre-filled interface below.

You might be able to do something similar on your own data as a pre-processing step using your own coref model. That way you can set the thresholds manually too.

We used this video as our basis

This is actually what we have done now to annotate our data, then trained using that video. The problem we are having is that our model trained via the previous video won't display when we want to annotate further data using our own model.

Hopefully that helps you understand, unless we are misinterpreting the video?

Cheers,
Liam

I may be misunderstanding what you're referring to. So just to double-check, is the issue that you're not able to get the references to render in Prodigy? Even when you try to pre-load the contents into a .jsonl file upfront?

Or, are you having trouble getting the model from the video to generate candidates for such a .jsonl file?

This is correct, sorry for the misunderstanding. This is the issue we are facing. Once we annotated all our data with entities and relations using prodigy, trained the model with it, and wanted to use our trained model to annotate further data but it doesn't predict or 'render' anything when imported into Prodigy with unannounced data

Ah now I see. Prodigy indeed does not recognize the coreference model that you've attached to your spaCy pipeline. You'll need to populate the .jsonl file with this information manually.

You could generate this information yourself in a notebook though. Suppose that a row in your .jsonl file looks like this:

{
	"text":"The Lemon Drop Kid , a New York City swindler, is illegally touting horses at a Florida racetrack. After several successful hustles, the Kid comes across a beautiful, but gullible, woman intending to bet a lot of money. The Kid convinces her to switch her bet, employing a prefabricated con. Unfortunately for the Kid, the woman \"belongs\" to notorious gangster Moose Moran , as does the money. ",
	"relations":[
		{
			"head":3,
			"child":9,
			"head_span":{"start":0,"end":18,"token_start":0,"token_end":3,"label":"NP"},
			"child_span":{"start":21,"end":45,"token_start":5,"token_end":9,"label":"NP"},
			"color":"#c5bdf4",
			"label":"COREF"
		},{
			"head":3,
			"child":26,
			"head_span":{"start":0,"end":18,"token_start":0,"token_end":3,"label":"NP"},
			"child_span":{"start":137,"end":140,"token_start":26,"token_end":26,"label":"ORG"},
			"color":"#c5bdf4",
			"label":"COREF"
		}
	]
}

Then it'll render in the prodigy interface like this:

image

The part listed under the relations key would need to be added by you.

Our data we annotated using prodigy already looks like this,

We annotated this using the rel.manual recipe.
We were expecting to pass new unannotated text to prodigy and review the predicted relationships using our model.

Coref is one of the names of our relations, we weren't using coref.manual. Sorry for the confusion.