Hello! I think I am running into a similar issue. I am trying to train from the rel.manual recipe for hypernyms and hyponyms.
I annotated via the following command:
prodigy rel.manual hypernym_NER en_core_web_lg "./datasets/hearst_hypernym_sentences_raw_text_handmade.txt" --label HYPER,HYPO,PATTERN --span-label HYPER,HYPO,PATTERN
(base) karl@karlkruncher:~/PycharmProjects/doctorlingo/test_scripts/cwi-master/CWI_Sequence_Labeller$ prodigy train "./NER_hypernym_model" --parser hypernym_NER --base-model en_core_web_lg --eval-split 0.1 --label-stats --gpu-id 0 -V
Using GPU: 0
/home/karl/anaconda3/lib/python3.8/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at Start Locally | PyTorch
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
========================= Generating Prodigy config =========================
Auto-generating config with spaCy
Using config from base model
Generated training config
=========================== Initializing pipeline ===========================
[2021-09-29 15:08:42,987] [DEBUG] Replacing listeners of component 'tagger'
[2021-09-29 15:08:45,600] [INFO] Set up nlp object from config
Components: parser
Merging training and evaluation data for 1 components
- [parser] Training: 27 | Evaluation: 2 (10% split)
Training: 27 | Evaluation: 2
Labels: parser (3)
- [parser] HYPER, HYPO, PATTERN
[2021-09-29 15:08:45,616] [INFO] Pipeline: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']
[2021-09-29 15:08:45,616] [INFO] Resuming training for: ['parser', 'tok2vec']
[2021-09-29 15:08:45,620] [INFO] Created vocabulary
[2021-09-29 15:08:47,207] [INFO] Added vectors: en_core_web_lg
[2021-09-29 15:08:48,637] [INFO] Finished initializing nlp object
[2021-09-29 15:08:48,638] [INFO] Initialized pipeline components: []
Initialized pipeline
============================= Training pipeline =============================
Components: parser
Merging training and evaluation data for 1 components
- [parser] Training: 27 | Evaluation: 2 (10% split)
Training: 27 | Evaluation: 2
Labels: parser (3)
- [parser] HYPER, HYPO, PATTERN
Pipeline: ['tok2vec', 'tagger', 'parser', 'attribute_ruler',
'lemmatizer', 'ner']
Frozen components: ['tagger', 'attribute_ruler', 'lemmatizer',
'ner']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS PARSER DEP_UAS DEP_LAS SENTS_F SCORE
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,652] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,653] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,654] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
[2021-09-29 15:08:48,655] [DEBUG] [W026] Unable to set all sentence boundaries from dependency parses. If you are constructing a parse tree incrementally by setting token.head values, you can probably ignore this warning. Consider using Doc(words, ..., heads=heads, deps=deps) instead.
Aborting and saving the final best model. Encountered exception:
KeyError("[E018] Can't retrieve string for hash '16588043228098313248'. This
usually refers to an issue with the Vocab
or StringStore
.")
Traceback (most recent call last):
File "/home/karl/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/karl/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/karl/anaconda3/lib/python3.8/site-packages/prodigy/main.py", line 61, in
controller = recipe(args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 331, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/karl/anaconda3/lib/python3.8/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/home/karl/anaconda3/lib/python3.8/site-packages/plac_core.py", line 232, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File "/home/karl/anaconda3/lib/python3.8/site-packages/prodigy/recipes/train.py", line 277, in train
return _train(
File "/home/karl/anaconda3/lib/python3.8/site-packages/prodigy/recipes/train.py", line 197, in _train
spacy_train(nlp, output_path, use_gpu=gpu_id, stdout=stdout)
File "/home/karl/anaconda3/lib/python3.8/site-packages/spacy/training/loop.py", line 122, in train
raise e
File "/home/karl/anaconda3/lib/python3.8/site-packages/spacy/training/loop.py", line 105, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "/home/karl/anaconda3/lib/python3.8/site-packages/spacy/training/loop.py", line 203, in train_while_improving
nlp.update(
File "/home/karl/anaconda3/lib/python3.8/site-packages/spacy/language.py", line 1122, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
File "spacy/pipeline/transition_parser.pyx", line 387, in spacy.pipeline.transition_parser.Parser.update
File "spacy/pipeline/transition_parser.pyx", line 638, in spacy.pipeline.transition_parser.Parser._init_gold_batch
File "spacy/pipeline/_parser_internals/arc_eager.pyx", line 649, in spacy.pipeline._parser_internals.arc_eager.ArcEager.init_gold
File "spacy/pipeline/_parser_internals/arc_eager.pyx", line 673, in spacy.pipeline._parser_internals.arc_eager.ArcEager._replace_unseen_labels
File "spacy/strings.pyx", line 132, in spacy.strings.StringStore.getitem
KeyError: "[E018] Can't retrieve string for hash '16588043228098313248'. This usually refers to an issue with the Vocab
or StringStore
."
I have not tried Atakan's solution yet, it seems like a lot of work. Has this been looked into since last week?