ryanwesslen
(Ryan Wesslen)
February 23, 2023, 12:34pm
6
hi @shakyawar.developer !
Sorry for the confusion, but if you simply annotated relations and entities with rel.manual
, the problem is data-to-spacy
doesn't work because spaCy's Doc object doesn't have built-in support for relations.
At the time of writing, spaCy doesn't natively support relation extraction models. The example that we list on our docs here is meant to be a tutorial on how to set up a custom component, not a guide on a feature in spaCy.
The crux of the issue is that the Doc object in spaCy currently has no support for relationships. That is also why, in turn, the .spacy object does not support them.
The config file that you see can be changed via the --config flag (docs ). If this flag is not set, which is y…
I just posted a similar post where I tried to provide steps of how you can modify Sofie's relations extraction video and code (be sure to watch segments where she describes the code):
Hi @stella !
Yes, this is exactly the setup Sofie does. She explicitly says from the beginning she's going to assume she already has a trained ner component.
Yes! Sofie used Thinc for training. You can see the training code here and she carefully explains the code in 8:11 to 18:30 the Thinc model script. She then describes around 22:55 an Overview of the TrainablePipe API and how to implement the custom component. You may not need to know all of the details and can luckily leverage a lot of t…
The key is you need to modify the parse_data.py
script as mentioned here:
Hi @korneliaB ,
Ok, let us step back for a bit. I realized that since you already have the labeled documents in Prodigy, you can export them into .jsonl using the db-out command , then
reuse / modify this parse_data.py script to convert the JSONL files into the spaCy format.
The reason why it errored out is because it expects some labels before the component is initialized. You can see this being done in the main function . So you have to do something like:
python scripts.parse_data path/to/js…
Hope this helps!