I'm trying to develop a model using NER and relation extraction with prodigy and had a usage question.
To start off, I generated a JSONL dataset that contained a few thousand sentences and were pre-labelled with the 3 spans I want to relate together (I retrieved their offsets using PhraseMatcher
). I then ran:
$ prodigy rel.manual ner_exp_restr_dep en_core_web_lg ./output.jsonl \
--label HAS_COSTS,IN_YEAR \
--span-label EXPENSE,MONEY,DATE \
--add-ents \
--wrap
and spent about 30 minutes annotating 100 or examples with relation data. The basic idea is that a EXPENSE
span relates to a MONEY
span, which relates to a DATE
span. After saving this to the DB, I ran:
$ prodigy train rel en ner_exp_restr_dep
which exported to a local directory. I then imported the model with:
import spacy
nlp = spacy.load("./rel/model-last")
doc = nlp("In 2020 we recorded $20 million in impairment charges")
for ent in doc.ents:
print(ent.text, ent.label_)
# 2020 DATE
# $20 million MONEY
# impairment charges EXPENSE
there doesn't seem to be a way to map from EXPENSE
-> MONEY
-> DATE
.
How do I map from one entity to another using the relations extracted in prodigy? I didn't see anything in the docs about the next steps required.