Dear Prodigy Support,
I recently got the Prodigy Nightly plan and I wanted to try to use the new features of data-to-spacy
(see e.g. here) in order to generate a config.cfg
file to use with the new spacy 3.0
.
Context
I would like to use the base model en_ner_craft_md
from scispacy
to train a NER model. Also, in my environment I have installed
spacy==3.0.5
scispacy==0.4.0
en_ner_craft_md==0.4.0
-
prodigy==1.11.0a5
.
Then I run
prodigy data-to-spacy \
--lang en \
--ner annotations15_EmmanuelleLogette_2020-09-22_raw9_Pathway \
--eval-split 0.1 \
--base-model en_ner_craft_md \
--optimize accuracy \
--verbose tmp
and I got the following output.
ℹ Using base model 'en_ner_craft_md'
============================== Generating data ==============================
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 134 | Evaluation: 14 (10% split)
Training: 134 | Evaluation: 14
Labels: ner (1)
- [ner] PATHWAY
/usr/local/lib/python3.7/dist-packages/spacy/training/iob_utils.py:142: UserWarning: [W030] Some entities could not be aligned in the text "Electrochemical potential-driven transporters (cla..." with entities "[(187, 196, 'PATHWAY'), (331, 344, 'PATHWAY'), (61...". Use `spacy.training.offsets_to_biluo_tags(nlp.make_doc(text), entities)` to check the alignment. Misaligned entities ('-') will be ignored during training.
entities=ent_str[:50] + "..." if len(ent_str) > 50 else ent_str,
✔ Saved 134 training examples
tmp/train.spacy
✔ Saved 14 evaluation examples
tmp/dev.spacy
============================= Generating config =============================
ℹ Using config from base model
✔ Generated training config
======================== Generating cached label data ========================
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/casalegn/.local/lib/python3.7/site-packages/prodigy/__main__.py", line 54, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 505, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/casalegn/.local/lib/python3.7/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/home/casalegn/.local/lib/python3.7/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/casalegn/.local/lib/python3.7/site-packages/prodigy/recipes/train.py", line 435, in data_to_spacy
nlp = spacy_init_nlp(config, use_gpu=0 if gpu else -1) # ID doesn't matter
File "/usr/local/lib/python3.7/dist-packages/spacy/training/initialize.py", line 57, in init_nlp
train_corpus, dev_corpus = resolve_dot_names(config, dot_names)
File "/usr/local/lib/python3.7/dist-packages/spacy/util.py", line 474, in resolve_dot_names
result = registry.resolve(config[section])
File "/usr/local/lib/python3.7/dist-packages/thinc/config.py", line 723, in resolve
config, schema=schema, overrides=overrides, validate=validate, resolve=True
File "/usr/local/lib/python3.7/dist-packages/thinc/config.py", line 772, in _make
config, schema, validate=validate, overrides=overrides, resolve=resolve
File "/usr/local/lib/python3.7/dist-packages/thinc/config.py", line 825, in _fill
promise_schema = cls.make_promise_schema(value, resolve=resolve)
File "/usr/local/lib/python3.7/dist-packages/thinc/config.py", line 1016, in make_promise_schema
func = cls.get(reg_name, func_name)
File "/usr/local/lib/python3.7/dist-packages/spacy/util.py", line 141, in get
) from None
catalogue.RegistryError: [E893] Could not find function 'specialized_ner_reader' in function registry 'readers'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
Available names: prodigy.MergedCorpus.v1, prodigy.NERCorpus.v1, prodigy.ParserCorpus.v1, prodigy.TaggerCorpus.v1, prodigy.TextCatCorpus.v1, spacy.Corpus.v1, spacy.JsonlCorpus.v1, spacy.read_labels.v1, srsly.read_json.v1, srsly.read_jsonl.v1, srsly.read_msgpack.v1, srsly.read_yaml.v1
Questions
- Could you please help me debugging the error above?
- I was expecting this command to produce a
config.cfg
file. And since the command printed
✔ Generated training config
I expected to find this configuration file in the ouput directory. However I could not find such a file, do you know why?
Maybe the message
ℹ Using config from base model
means that noconfig.cfg
is going to be generated, and that I can just use the file/usr/local/lib/python3.7/dist-packages/en_ner_craft_md/en_ner_craft_md-0.4.0/config.cfg
?
Thank you very much in advance for your kind help!
—Francesco