Hi @korneliaB ,
Ok, let us step back for a bit. I realized that since you already have the labeled documents in Prodigy, you can export them into .jsonl
using the db-out
command, then
reuse / modify this parse_data.py script to convert the JSONL files into the spaCy format.
The reason why it errored out is because it expects some labels before the component is initialized. You can see this being done in the main function. So you have to do something like:
python scripts.parse_data path/to/json path/to/train.spacy path/to/dev.spacy path/to/test.spacy
If you're using your own dataset, you might need to adjust the parsing process. But a good first step would be to try this script out in your own exported JSONL files.