Hi all,
I got the following error:
✘ Invalid data for component 'ner'
text field required
By running:
prodigy train ner $DATASET_NAME "data_path_model" --output "/results"
The dataset I ingested into prodidy db is an iob , which I translated with spacy convert following command, to json;
python -m spacy convert $DATA_FILE $DATA_OUTPUT_PATH -t json -n 1 -c iob
The prodigy db-in command ran fine and the dataset was ingested on prodigy db.
The format of it is this way:
[
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"bbb",
"tag":"-",
"ner":"O"
},
{
"orth":"aaa",
"tag":"-",
"ner":"O"
}, continuation
Could you help me correct it please?
Thank you
Hi @JulieSarah ,
It seems that you're using spaCy v2's JSON format, not Prodigy's JSON format, that's why it's looking for a "text"
field. You can convert your dataset into that format and try again.
Thank you but it does not solve the problem. Even with jsonl I have the same error as I don't have any text field!
Hi @JulieSarah ,
You need to convert your data into Prodigy's JSONL format. You're passing the incorrect format into Prodigy that's why you see an error. Since you already have a spaCy file, you can convert it using this script: Script: Load data in spaCy v3's .spacy format
So again, what you can do is:
IOB -> .spaCy file (save it into a serialized file using the DocBin construct) -> Prodigy JSONL file (using the script provided, which you can edit based on your needs)