Cannot use the ner.gold-to-spacy output JSONL data to train in spacy train

Hi guys, I have a spacy model with tag and dep pipeline only. Then I want to train my current model to understand NER.

So I annotate my examples with Prodigy and use ner.gold-to-spacy with BILUO to export my examples so I can use this examples to train my current model using Spacy.

This is the export result:

[“Sampul dari dua singel pertama difoto oleh Emma Summerton pada bulan April 2010 dan tiga gambar lainnya diambil oleh artis yang dirilis untuk mempromosikan album di bulan Juli. Sampul resmi album menunjukkan Perry sedang berbaring telanjang di awan kembang gula, dilukis di atas kanvas oleh Will Cotton dan dirilis pada tanggal 21 Juli melalui webstream langsung.”,[“O”,“O”,“U-CARDINAL”,“O”,“U-ORDINAL”,“O”,“O”,“B-PERSON”,“L-PERSON”,“O”,“O”,“B-DATE”,“L-DATE”,“O”,“U-CARDINAL”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“U-DATE”,“O”,“O”,“O”,“O”,“O”,“U-PERSON”,“O”,“O”,“O”,“O”,“O”,“B-PRODUCT”,“L-PRODUCT”,“O”,“O”,“O”,“O”,“O”,“O”,“B-PERSON”,“L-PERSON”,“O”,“O”,“O”,“O”,“B-DATE”,“L-DATE”,“O”,“O”,“O”,“O”]]
[“Baler adalah munisipalitas yang terletak di provinsi Aurora, Filipina.”,[“U-GPE”,“O”,“O”,“O”,“O”,“O”,“O”,“U-GPE”,“O”,“U-GPE”,“O”]]
[“Bagaimana sejarah berdirinya SMAN 1 Pekanbaru yang sudah berusia setengah abad itu?”,[“O”,“O”,“O”,“B-ORG”,“I-ORG”,“L-ORG”,“O”,“O”,“O”,“O”,“O”,“O”,“O”]]
[“Pencurian dan perampokan juga sangat jarang terjadi di wilayah ini.”,[“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”]]
[“Dalam lima tahun sejak panggilan of Duty 4 berlangsung, ia telah dipromosikan menjadi kapten di Special Air Service.”,[“O”,“U-CARDINAL”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“B-ORG”,“I-ORG”,“L-ORG”,“O”]]
[“Laut Sulawesi merupakan tempat bagi banyak spesies ikan dan makhluk bawah air.”,[“B-LOC”,“L-LOC”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”]]
[“Siapa dapat memastikan apakah sebuah kenyataan itu sesungguhnya impian dan sebuah impian itu justru sesungguhnya kenyataan?”,[“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”,“O”]]
[“Salmon biasa diasap untuk membuat lox, dan beberapa spesies ikan seperti bandeng, hering, makerel, tongkol, tenggiri, dan gabus juga biasa diasap panas.”,[“U-PRODUCT”,“O”,“O”,“O”,“O”,“U-PRODUCT”,“O”,“O”,“O”,“O”,“O”,“O”,“U-PRODUCT”,“O”,“U-PRODUCT”,“O”,“U-PRODUCT”,“O”,“U-PRODUCT”,“O”,“U-PRODUCT”,“O”,“O”,“U-PRODUCT”,“O”,“O”,“O”,“O”,“O”]]

Then I use this command to train my current model using Spacy:

spacy train id Final-Gold-standard\ Model/default/ ner-train.jsonl ner-test.jsonl --base-model id_ud-tag-dep-ner-1.0.0/ --pipeline “ner” --n-iter 1000 -ne 10 --use-gpu 0

But it triggers error like this:

Training pipeline: [‘ner’]
Starting with base model ‘id_ud-tag-dep-ner-1.0.0/’
Counting training words (limit=0)
Traceback (most recent call last):
File “/opt/tljh/user/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/opt/tljh/user/lib/python3.6/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/opt/tljh/user/lib/python3.6/site-packages/spacy/main.py”, line 35, in plac.call(commands[command], sys.argv[1:]) File “/opt/tljh/user/lib/python3.6/site-packages/plac_core.py”, line 328, in call cmd, result = parser.consume(arglist) File “/opt/tljh/user/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File “/opt/tljh/user/lib/python3.6/site-packages/spacy/cli/train.py”, line 206, in train
n_train_words = corpus.count_train()
File “gold.pyx”, line 177, in train_tuples
ValueError: need more than 1 value to unpack

Please anybody can help? Or @ines / @honnibal can explain?

Or actually I have to do the training with this method “updating NER in spacy” only?
Thank you.

Yes, sorry if this was confusing! ner.gold-to-spacy creates annotations in a format that can be more easily consumed by spaCy – texts and the token-based BILUO tags. We currently don't have a command that directly outputs the JSON training format – also because we're currently working on a new and more intuitive data format that's also more closely aligned with Prodigy's JSONL.

Ok now it is clear :smiley: , and one question again I think, I look at cli/converters folder in spaCy’s GitHub and there’s a jsonl2json converter. What is this for? I mean what kind of jsonl format is this? Thank you

The jsonl2json converter in spaCy should work with Prodigy’s jsonl format. The converter expects very little of the data: just the text key and the spans, which should follow Prodigy’s format. So you should be able to use prodigy db-out and use the resulting jsonl file with spacy convert.