I have already done the NER labeling and would now like to label relationships as well. How can I do this on my existing data? Is it possible to change and correct the entities when labeling the relationships? Is there anything I need to consider? (e.g. save in a new dataset).
hi @yllwpr!
Here's a few links and posts that may help.
First, have you seen the Prodigy documentation on Combining Named Entity Recognition with relation extraction? This may answer a lot of your questions.
It also includes a trick that you can use Prodigy datasets as source inputs (instead of files) if you prefix it with dataset:
like dataset:ner_rels_ent
. Also, you can also use the suffix like reject
, accept
, or ignore
too to the dataset in case you want to review only a subset of that Prodigy dataset. I would except you would only want to use the accept
records.
The documentation on rel.manual
can show how you can do both ner
and rel
jointly. Related, here's a post on how to review both (e.g., if you need to have an adjudicator between different annotators):
As the first link does it, it may be best to save your relationship data into a new a dataset. Also since you're considering multiple trainable components, it may be worth learning/using spacy custom config (config.cfg
) files. You can pass a config.cfg
file as an argument to your prodigy train
function (or alternatively just use spacy train
, which is really what prodigy train
does). Here's a related post:
That post has a link to several open source projects available on our explosion/projects
repo, like on ner
and rel
. I don't think one does both but perhaps can help you on how to think about structuring these projects.
Thanks again for your questions and let us know if you have any further ones!
Thank you for your detailed answer!