Hi! Happy to hear the REL tutorial was useful to you
The REL tutorial was meant as an example for implementing your own custom trainable component from scratch, and I think the provided implementation for relation extraction should really be taken as a baseline to start from. I can imagine a realistic application would benefit from additional features or a more complex network architecture.
That said - you can take the code from the example project and construct a config file that refers to the relation extraction component and the NER at the same time. You can extend the provided config with an NER component. If you need inspiration on how to define the NER component, you can run python -m spacy init config -p "ner" ner_config.cfg
and merge that config with the REL one, so you'd have a pipeline including tok2vec
, ner
and relation_extractor
. You can decide whether they should share the same tok2vec
layer or not.
If you feed in data that has both named entities annotated as well as the relations, it should train both simultaneously. You'll need to use the -c
flag on the train
command to make sure the custom functions and architectures from the REL code are imported, because these are not built-in in spaCy. In the example project, this is accomplished by doing -c custom_functions.py
.
To define your training data, you could follow the same conventions as the REL example project and store the information in the custom attribute doc._.rel
, cf here: projects/parse_data.py at v3 ยท explosion/projects ยท GitHub. After creating the appropriate Doc
objects with the gold-standard data (entities + relations), you can serialize them to file with DocBin
to create the binary .spacy
files that you can feed into the spacy train
command.
I think that's pretty much the general overview. Let me know if you run into specific issues!