Hi Ines,
I'm trying to tag Entities in NEWS articles and train using prodigy. [This is just a trial/test run using prodigy, so using minimum number of example]
I have used below command
"prodigy ner.manual ner_v12 en_core_web_sm prodigy_format_ner_input_v12_sample.jsonl --label Role,Department"
This worked for me in tagging.
Then i was trying to use "Train" via prodigy using the below command:
"prodigy train ner ner_v12 en_core_web_sm --init-tok2vec ./tok2vec_cd8_model289.bin --output ./tmp_model --eval-split 0.2"
I'm getting an error as:
Loaded model 'en_core_web_sm'
Created and merged data for 489 total examples
Using 392 train / 97 eval (split 20%)
Component: ner | Batch size: compounding | Dropout: 0.2 | Iterations: 10
Initializing with tok2vec weights ./tok2vec_cd8_model289.bin
Traceback (most recent call last):
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/site-packages/prodigy/main.py", line 52, in
controller = recipe(args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 213, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/site-packages/prodigy/recipes/train.py", line 130, in train
load_pretrained_tok2vec(pipe, init_tok2vec, require=True)
File "cython_src/prodigy/util.pyx", line 520, in prodigy.util.load_pretrained_tok2vec
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/site-packages/thinc/neural/_classes/model.py", line 376, in from_bytes
copy_array(dest, param[b"value"])
File "/home/merit/anaconda3/envs/prodigy/lib/python3.7/site-packages/thinc/neural/util.py", line 145, in copy_array
dst[:] = src
ValueError: could not broadcast input array from shape (128) into shape (96)
Please let me know how to proceed further!