data-to-spacy for rel_component training

korneliaB · August 9, 2022, 12:57am

Hi,

I am using
python -m prodigy rel.manual ner_rel blank:en transcripts/xxx.txt --label LABEL1,LABEL2 --span-label LABEL1,LABEL2

recipe to annotate the dataset for relationship extraction

I am trying to use this dataset to train the rel_component using the instructions here:

but I am stuck ah how do I convert the .jsonl dataset that I get into training.spacy and dev.spacy

ljvmiranda921 · August 10, 2022, 12:55am

Hi @korneliaB !

You should be able to use data-to-spacy for this purpose. You can use the --parser parameter to achieve that. Something like this:

prodigy data-to-spacy ./corpus --parser <my-dataset>

korneliaB · August 15, 2022, 5:07pm

Thank you!

It worked, but now I get the following error:

ValueError: [E143] Labels for component 'relation_extractor' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's `initialize` method.

I run the same code on the data I prepared last year in a different annotation tool (UBIAI) and it works just fine, so I am certain there is something wrong with the data set here.

ljvmiranda921 · August 16, 2022, 7:46am

Hi @korneliaB ,

Ok, let us step back for a bit. I realized that since you already have the labeled documents in Prodigy, you can export them into .jsonl using the db-out command, then
reuse / modify this parse_data.py script to convert the JSONL files into the spaCy format.

The reason why it errored out is because it expects some labels before the component is initialized. You can see this being done in the main function. So you have to do something like:

python scripts.parse_data path/to/json path/to/train.spacy path/to/dev.spacy path/to/test.spacy

If you're using your own dataset, you might need to adjust the parsing process. But a good first step would be to try this script out in your own exported JSONL files.

yllwpr · August 21, 2022, 12:46pm

This is a good article on that. Maybe the linked resources will help you.

Topic		Replies	Views
How to convert prodigy dataset to .spacy object? usage , spacy , solved	6	1303	January 13, 2023
prodigy data-to-spacy for relation extraction ner , spacy , relations	4	1165	February 23, 2023
How to extract dependencies in spaCy after using prodigy rel.manual? usage , spacy , relations	7	1466	April 19, 2021
Rel training usage , relations , training	7	1273	May 22, 2023
Training a relation extraction component solved , relations , training	84	5714	June 27, 2023

data-to-spacy for rel_component training

Related topics