Annotating custom semantic relations

einarbmag · August 29, 2019, 4:25pm

I have a project where I've done a lot of custom entity tagging, but I also want to train a model for custom semantic relations similar to what's described here: https://spacy.io/usage/training#intent-parser

I don't see how the dep.teach recipe can be used here.

I've looked at tools like Brat and Webanno, but I'm quite confused with all the formats... has anyone figured out the best workflow for this?

honnibal · August 29, 2019, 9:51pm

We have a solution in mind for this, but we haven't had a chance to implement it yet. Currently most people are interested in span identification (like NER) or text classification problems, so annotating trees and graphs hasn't been as much of a priority.

The other problem is that tree- and graph-problems are often quite different from each other, suggesting different solutions. The main question is: how complicated are your trees? Do they only have a few relations per sentence, or are they quite dense?

If most of the information is in identifying the anchors of the relations (e.g. entity spans), and the relation only provides something like a directional relationship (such as which company bought the other), you may find that you can do the relation classification as text classification. For instance, you'd have a text classification label that said A buys B and a second class for B buys A. I would suggest this as a good approach so long as you will only have a few dozen labels. If you can do it this way, you'll also probably get great accuracy: the model will perform so much better if it gets to predict the whole structure like that, instead of having to compromise and predict the edges individually.

If you do have complicated relations to predict, then I would probably suggest doing the annotations in a tool like WebAnno or BRAT currently. As I said, we hope to have support for this in future --- but currently at least there are free tools that should be quite productive.

einarbmag · August 30, 2019, 8:56am

Thanks @honnibal, good to know that it's on the to-do list for Prodigy! And of course, I assume that if I first create a custom spacy parser model with a little bit of manually created seed training data, I could technically use dep.teach after that?

Good point about using text classification in the simple cases... I think mine requires some more structure though. Not dense like dependency parsing, but more than a single relation per sentence. Here's an example:

I want to exchange black t-shirts for white jeans... 10 of the former, 5 of the latter

I.e., a sentence can contain multiple products, each with accompanying attributes, and then I can have disjointly placed entities that reference specific products mentioned. Seems to me this would be best handled by a relationship parser, do you agree? Or is there a simpler approach?

Bonus question: let's say you have an ontology where your objects are uniquely defined by a set of attributes, but you're only selling one type of object so it's not necessary to specify the root:

Instead of
"I want two black t-shirts in medium and three white t-shirts, small",
you'd have
"I want two black in medium and three white, small"

How would you define the relations here? I know that there are two "hidden" t-shirt entities, which have directed relations to the attributes. But the grouping black+medium and white+small isn't really directional. Is there any non-directed relationship in spaCy?

honnibal · August 30, 2019, 9:09pm

Well, I think you might have luck with text classification here. It might not work, but it's worth a try.

I would code the above with something like IN_ORDER: the attributes match up to the products such that the first attribute matches the first product, and the second attribute matches the second product.

Here's a simple and general coding scheme you could try. Assign the products numbers according to their order of occurrence. So in your example above, you'll have 1. black t-shirts and 2. white jeans. Now assign each attribute the number of the product it applies to. So the label to assign to your input is 1,2 above.

You're probably not going to have that many products per sentence, which means the range of possible structures to predict might be pretty limited. Maybe this won't work --- but maybe it will.

Another option is to use rules to get all the really easy cases, like cases where the attribute is directly attached to the product in the dependency tree. That might do really well, and then you can think of how to use models to get the more difficult cases only.

It could be that these approaches really don't work and you do need to use relation parsing -- but predicting trees is hard work, and the methods for it are really designed for hard cases like syntax where the space of trees is completely enormous. If the space of trees is actually very small, you might benefit from just predicting over the possible tree shapes directly.

einarbmag · September 2, 2019, 8:55am

Thanks @honnibal, that's very helpful. I see how this approach might make sense, and it also makes it a lot easier to use Prodigy for. I think I'll give it a go!

Topic		Replies	Views
Train a custom parser usage , spacy	1	463	December 14, 2018
Named Entity Relations - beginner questions usage , relations	2	873	September 16, 2020
Relationship between named entities ner , relations	5	2489	November 20, 2020
Annotation scheme / workflow for entity relations usage , relations	16	1795	February 23, 2021
Is there any recipes to train a relation-extraction model? enhancement , usage , custom	11	5822	November 26, 2018

Annotating custom semantic relations

Related topics