Hi,
I have recently upgraded my python to 1.9.3 and using spacy 2.2.3. I used to use convert command to get spacy format to train.
python -m spacy convert -l en -t json -c jsonl prodigy-data.jsonl /spacy_dir
After installed new version of prodigy, used data-to-spacy command to get spacy format.
python -m prodigy data-to-spacy .\spacy_dataset\train-data.json .\spacy_dataset\eval-data.json --lang en --ner data_shuffled_cleaned
data_shuffled_cleaned is the dataset which I was successful with prodigy train CLI with 90% overall accuracy. The same dataset I am trying to use for spacy train CLI.
So i ran spacy train command but getting below error. May I know why?
python -m spacy train en model train-data.json eval-data.json --pipeline ner -v en_vectors_web_lg --verbose
Traceback (most recent call last):
File "*************/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "************/python3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "*******/python3/lib/python3.6/site-packages/spacy/main.py", line 33, in
plac.call(commands[command], sys.argv[1:])
File "***********/python3/lib/python3.6/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File *******/python3/lib/python3.6/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "************/python3/lib/python3.6/site-packages/spacy/cli/train.py", line 230, in train
corpus = GoldCorpus(train_path, dev_path, limit=n_examples)
File "gold.pyx", line 224, in spacy.gold.GoldCorpus.init
File "gold.pyx", line 235, in spacy.gold.GoldCorpus.write_msgpack
File "gold.pyx", line 280, in read_tuples
File "gold.pyx", line 545, in read_json_file
File "gold.pyx", line 592, in _json_iterate
OverflowError: value too large to convert to int