Rule base or pattern approach for entity relations

Hi Guys,

Can I create a pattern approach for entity relation extraction with a pre trained ner model? Similar like we do NER.

For example, I have 4 entities( Location, Negative, Problem, Time). I will like to create a pattern that if Negative is in the same line as Problem it will create a relation of No_Problem, same approach if Problem is with Time, it will create a new relation of Problem_Time.

I'm sorry if is in the documentation, It is not so clear to me.

Victor

Nevermind I found it on the documentation :sweat_smile: :sweat_smile:

1 Like

Hello @vtorres,
I am at the same point as you. Could you maybe post an example of your pattern file in here? Thanks a lot in advance!

Hello how can I create a pattern file for rel.manual?
e.g. I would like to label the entities PERSON and LOCATION with the relation LIVES_IN.
Unfortunately the following does not work:
What do I need to change in the .jsonl so that this relation is automatically labeled.

{"label":"TEST1","pattern":[{"ENT_TYPE":"PERSON"},{"ENT_TYPE":"LOCATION"}]}

hi @yllwpr!

Per the docs, rel.manual only accepts span patterns: "Path to patterns file defining spans to be added and merged."

However, an alternative comes from the Prodigy Relations docs on using a custom model (or DependencyMatcher) to pre-highlight:

You don’t need to use spaCy to let a model highlight suggestions for you. Under the hood, the concept is pretty straightforward: if you stream in examples with pre-defined "tokens" , "relations" and optional "spans" , Prodigy will accept and pre-highlight them. This means you can either stream in pre-labelled data, or write a custom recipe that uses your model to add tokens and relations to your data.

Perhaps you could also use the DependencyMatcher instead of a model in the part in the docs that creates the function add_relations_to_stream().

def add_relations_to_stream(stream):
   custom_model = load_your_custom_model() # use DependencyMatcher below
   for eg in stream:
      deps, heads = custom_model(eg["text"]) 
      eg["relations"] = []
      for i, (label, head) in enumerate(zip(deps, heads)):
         eg["relations"].append({"child": i, "head": head, "label": label})
      yield eg

I created a dummy example of how to use DependencyMatcher for your use case:

import spacy
from spacy.matcher import DependencyMatcher

nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)

pattern = [
    {
        "RIGHT_ID": "anchor_lives",
        "RIGHT_ATTRS": {"LEMMA": "live"}
    },
    {
        "LEFT_ID": "anchor_lives",
        "REL_OP": ">",
        "RIGHT_ID": "founded_subject",
        "RIGHT_ATTRS": {"ENT_TYPE": "PERSON", "DEP": "nsubj"},
    },
    {
        "LEFT_ID": "anchor_lives",
        "REL_OP": ">",
        "RIGHT_ID": "lives_prep",
        "RIGHT_ATTRS": {"DEP": "prep"},
    },    
    {
        "LEFT_ID": "lives_prep",
        "REL_OP": ">",
        "RIGHT_ID": "lives_location",
        "RIGHT_ATTRS": {"ENT_TYPE": "GPE"},
    }
]

matcher.add("LIVES_IN", [pattern])
doc = nlp("Steve lives in Seattle.")
matches = matcher(doc)

print(matches) 
# Each token_id corresponds to one pattern dict
match_id, token_ids = matches[0]
for i in range(len(token_ids)):
    print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text)

# [(6920517205109732861, [1, 0, 2, 3])]
# anchor_lives: lives
# founded_subject: Steve
# lives_prep: in
# lives_location: Seattle

Let me know if this makes sense or if you have any questions. If you are able to get this to work, we'd greatly appreciate an example for the community!