I want to convert my .jsonl file that contains annotated data to .spacy binary files for training a relation extraction (RE) model. I succeed to do it for the NER part with:
prodigy data-to-spacy ./corpus_ner --ner bla --eval-split 0.3 -V
but I cannot find the similar parameter (like "--ner") for RE.
Furthermore, I observed that in the case of the "--ner" parameter a config file was generated. Is the config file customized based on the input text (the .jsonl file) or is it a default one?
At the time of writing, spaCy doesn't natively support relation extraction models. The example that we list on our docs here is meant to be a tutorial on how to set up a custom component, not a guide on a feature in spaCy.
The crux of the issue is that the Doc object in spaCy currently has no support for relationships. That is also why, in turn, the .spacy object does not support them.
The config file that you see can be changed via the --config flag (docs). If this flag is not set, which is your case, it will auto generate the default settings as found here.
Thank you for your swift answer. Now I have a better image of what needs to be done. I have another question for you regarding the relation extraction models in spaCy.
Is it any limitation/recommendation regarding the training set (text length wise, relation length, relation between entities belonging to different sentences)? We obtained better results if the text has one sentence (this is also available for the example from spaCy). My interest is for extracting relations for entities that are in different sentences.
Can you explain how you were able to address the issue?
For relationship extraction, we required a named entity as well, so do we need to run data-to-spacy two time one for ner and another for re or in the single pass it can be done?