I ran the batch train on custom model converted from gensim to spacy on 320k annotated datasets (50% positive and 50% negative) and it took solid 24hrs to complete and then it returned below error before outputting the model
Let me know how do i go about it.
command:
nohup python -m prodigy textcat.batch-train followup_report_3M /home/ubuntu/cnn-annotation/InstallPackages/model/pmcmodel/PubMed-and-PMC-w2v-spacy.bin --eval-split 0.2 -n 6 --dropout 0.2 --output followup_report_3M_model_PMC_PUB &
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/prodigy/__main__.py", line 248, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 150, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/prodigy/recipes/textcat.py", line 154, in batch_train
nlp = nlp.from_bytes(best_model)
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/spacy/language.py", line 680, in from_bytes
msg = util.from_bytes(bytes_data, deserializers, {})
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/spacy/util.py", line 501, in from_bytes
msg = msgpack.loads(bytes_data, encoding='utf8')
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/msgpack_numpy.py", line 187, in unpackb
return _unpacker.unpackb(packed, encoding=encoding, **kwargs)
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/msgpack/fallback.py", line 122, in unpackb
unpacker.feed(packed)
File "/home/ubuntu/cnn-annotation/venv/lib/python3.5/site-packages/msgpack/fallback.py", line 291, in feed
raise BufferFull
msgpack.exceptions.BufferFull
Loaded model /home/ubuntu/cnn-annotation/InstallPackages/model/pmcmodel/PubMed-and-PMC-w2v-spacy.bin
Using 20% of examples (65254) for evaluation
Using 100% of remaining examples (261016) for training
Dropout: 0.2 Batch size: 10 Iterations: 6
# LOSS F-SCORE ACCURACY
01 1006.154 0.968 0.968
02 823.834 0.968 0.968
03 816.626 0.968 0.968
04 807.455 0.969 0.969
05 799.661 0.970 0.969
06 794.458 0.970 0.970
MODEL USER COUNT
accept accept 31645
accept reject 1169
reject reject 31636
reject accept 804
Correct 63281
Incorrect 1973
Baseline 0.50
Precision 0.96
Recall 0.98
F-score 0.97
Accuracy 0.97