Hi, sorry to spam all these questions.
I found a peculiar error where if my annotated dataset is at 1000 samples it gives a dimension error.
Let’s say I have an annotated-data.json
with more than 1000 samples. When I have 999 samples I can load the model ok, but not at 1000 samples.
prodigy drop test
prodigy dataset test "yoyo"
head -n 1000 annotated-data.jsonl > small-data.jsonl
prodigy db-in test small-data.jsonl
prodigy textcat.batch-train test en_core_web_sm --output test -n 1
python -c "import spacy; spacy.load('test')"
Here is the error.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/spacy/__init__.py", line 13, in load
return util.load_model(name, **overrides)
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/spacy/util.py", line 107, in load_model
return load_model_from_path(Path(name), **overrides)
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/spacy/util.py", line 138, in load_model_from_path
return nlp.from_disk(model_path)
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/spacy/language.py", line 541, in from_disk
util.from_disk(path, deserializers, exclude)
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/spacy/util.py", line 483, in from_disk
reader(path / key)
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/spacy/language.py", line 537, in <lambda>
deserializers[proc.name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
File "spacy/pipeline.pyx", line 170, in spacy.pipeline.BaseThincComponent.from_disk (spacy/pipeline.cpp:11298)
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/spacy/util.py", line 483, in from_disk
reader(path / key)
File "spacy/pipeline.pyx", line 163, in spacy.pipeline.BaseThincComponent.from_disk.load_model (spacy/pipeline.cpp:10856)
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 352, in from_bytes
copy_array(dest, param[b'value'])
File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/thinc/neural/util.py", line 48, in copy_array
dst[:] = src
ValueError: could not broadcast input array from shape (128) into shape (64)