prodigy ner train error iob translated to json annotation data

Hi all,
I got the following error:

✘ Invalid data for component 'ner'
text field required

By running:
prodigy train ner $DATASET_NAME "data_path_model" --output "/results"

The dataset I ingested into prodidy db is an iob , which I translated with spacy convert following command, to json;
python -m spacy convert $DATA_FILE $DATA_OUTPUT_PATH -t json -n 1 -c iob

The prodigy db-in command ran fine and the dataset was ingested on prodigy db.

The format of it is this way:

[
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"bbb",
"tag":"-",
"ner":"O"
},
{
"orth":"aaa",
"tag":"-",
"ner":"O"
}, continuation
Could you help me correct it please? :grin:
Thank you

Hi @JulieSarah ,

It seems that you're using spaCy v2's JSON format, not Prodigy's JSON format, that's why it's looking for a "text" field. You can convert your dataset into that format and try again.

Thank you but it does not solve the problem. Even with jsonl I have the same error as I don't have any text field!
Screenshot 2022-03-25 at 17.55.42

Hi @JulieSarah ,

You need to convert your data into Prodigy's JSONL format. You're passing the incorrect format into Prodigy that's why you see an error. Since you already have a spaCy file, you can convert it using this script: Script: Load data in spaCy v3's .spacy format

So again, what you can do is:

IOB -> .spaCy file (save it into a serialized file using the DocBin construct) -> Prodigy JSONL file (using the script provided, which you can edit based on your needs)