hi @yllwpr!
Per the docs, rel.manual
only accepts span patterns: "Path to patterns file defining spans to be added and merged."
However, an alternative comes from the Prodigy Relations docs on using a custom model (or DependencyMatcher) to pre-highlight:
You don’t need to use spaCy to let a model highlight suggestions for you. Under the hood, the concept is pretty straightforward: if you stream in examples with pre-defined "tokens"
, "relations"
and optional "spans"
, Prodigy will accept and pre-highlight them. This means you can either stream in pre-labelled data, or write a custom recipe that uses your model to add tokens and relations to your data.
Perhaps you could also use the DependencyMatcher instead of a model in the part in the docs that creates the function add_relations_to_stream()
.
def add_relations_to_stream(stream):
custom_model = load_your_custom_model() # use DependencyMatcher below
for eg in stream:
deps, heads = custom_model(eg["text"])
eg["relations"] = []
for i, (label, head) in enumerate(zip(deps, heads)):
eg["relations"].append({"child": i, "head": head, "label": label})
yield eg
I created a dummy example of how to use DependencyMatcher for your use case:
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)
pattern = [
{
"RIGHT_ID": "anchor_lives",
"RIGHT_ATTRS": {"LEMMA": "live"}
},
{
"LEFT_ID": "anchor_lives",
"REL_OP": ">",
"RIGHT_ID": "founded_subject",
"RIGHT_ATTRS": {"ENT_TYPE": "PERSON", "DEP": "nsubj"},
},
{
"LEFT_ID": "anchor_lives",
"REL_OP": ">",
"RIGHT_ID": "lives_prep",
"RIGHT_ATTRS": {"DEP": "prep"},
},
{
"LEFT_ID": "lives_prep",
"REL_OP": ">",
"RIGHT_ID": "lives_location",
"RIGHT_ATTRS": {"ENT_TYPE": "GPE"},
}
]
matcher.add("LIVES_IN", [pattern])
doc = nlp("Steve lives in Seattle.")
matches = matcher(doc)
print(matches)
# Each token_id corresponds to one pattern dict
match_id, token_ids = matches[0]
for i in range(len(token_ids)):
print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text)
# [(6920517205109732861, [1, 0, 2, 3])]
# anchor_lives: lives
# founded_subject: Steve
# lives_prep: in
# lives_location: Seattle
Let me know if this makes sense or if you have any questions. If you are able to get this to work, we'd greatly appreciate an example for the community!