not able to get the example running. https://github.com/explosion/projects/tree/master/ner-food-ingredients

py -m prodigy train --ner food_data en_vectors_web_lg ---paths.init_tok2vec .\tok2vec_cd8_model289.bin --eval-split 0.2 --output tmp_model

is not working... (I followed one of the discussions where --init_tok2vec was replaced.

  1. --output is not working
  2. C:\Users\dwu\Documents\ner_Prodigy\ner-food-ingredients>py -m prodigy train --ner food_data en_vectors_web_lg ---paths.init_tok2vec .\tok2vec_cd8_model289.bin --eval-split 0.2
    :information_source: Using CPU

========================= Generating Prodigy config =========================
:information_source: Auto-generating config with spaCy
:heavy_check_mark: Generated training config

=========================== Initializing pipeline ===========================
✘ Error parsing config overrides
-paths -> init_tok2vec not a section value that can be overridden

I downloaded tok2vec_cd8_model289.bin it is the folder
the first three steps seem to work just not the training step

(these are listed below just for reference )

  1. Create a phrase list using seed terms. Requires sense2vec and a vectors package.
py -m prodigy sense2vec.teach food_terms ./s2v_reddit_2015_md --seeds "garlic, avocado, cottage cheese, olive oil, cumin, chicken breast, beef, iceberg lettuce"
  1. Convert the phrase list to a match patterns file.
py -m prodigy  terms.to-patterns food_terms --label INGRED --spacy-model blank:en > ./food_patterns.jsonl
  1. Label data manually with help from the patterns.
py -m prodigy  ner.manual food_data blank:en ./reddit_r_cooking_sample.jsonl --label INGRED --patterns food_patterns.jsonl

Hey @deewuok ,

You might have already seen my answer to the very same question here

When you override config setting you should use double (not triple) dashes (I think there is spelling mistake in the original answer). So could you try:

 py -m prodigy train --ner food_data en_vectors_web_lg --paths.init_tok2vec .\tok2vec_cd8_model289.bin --eval-split 0.2 --output tmp_model

Thanks!

Two New problems. Dont know if this is the best place to post or start a new thread. (I'll guess I'll start the new thread for the issue with s2v_reddit_2015_md.tar.gz and 01_Preprocess_Reddit.ipynb. and just state the other issue here on --output

However

the command line you gave ran (except for --output). says it cant find it.
its not a big deal just letting you, but it would be nice to know if there is way to specify it for the future.

py -m prodigy train --ner food_data en_vectors_web_lg --paths.init_tok2vec .\tok2vec_cd8_model289.bin --eval-split 0.2 --output tmp_model
:information_source: Using CPU
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\dwu\AppData\Local\Programs\Python\Python311\Lib\site-packages\prodigy_main
.py", line 50, in
main()
File "C:\Users\dwu\AppData\Local\Programs\Python\Python311\Lib\site-packages\prodigy_main
.py", line 44, in main
controller = run_recipe(run_args)
^^^^^^^^^^^^^^^^^^^^
File "cython_src\prodigy\cli.pyx", line 98, in prodigy.cli.run_recipe
File "cython_src\prodigy\cli.pyx", line 99, in prodigy.cli.run_recipe
File "C:\Users\dwu\AppData\Local\Programs\Python\Python311\Lib\site-packages\prodigy\recipes\train.py", line 288, in train
overrides = parse_config_overrides(list(_extra))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dwu\AppData\Local\Programs\Python\Python311\Lib\site-packages\spacy\cli_util.py", line 108, in parse_config_overrides
cli_overrides = _parse_overrides(args, is_cli=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dwu\AppData\Local\Programs\Python\Python311\Lib\site-packages\spacy\cli_util.py", line 127, in _parse_overrides
raise NoSuchOption(orig_opt)
click.exceptions.NoSuchOption: No such option: --output

[starting issue about s2v_reddit_2015_md.tar.gz and 01_Preprocess_Reddit.ipynb in a new thread) and thank you for your help.

Hi @deewuok,

I responded to your question about the input file in the dedicated thread here. Hope it helps!

About the traincommand:
To start with a tip: you can run the command with the -h flag ( for help) to quickly see all the available options. If you do that, you'll see that, indeed, there's no --output option. The location of the output model is the optional positional argument that should be listed first. So your command should be:

python -m prodigy train ./tmp_model --ner food_data en_vectors_web_lg --eval-split 0.2 --paths.init_tok2vec=.\tok2vec_cd8_model289.bin

Also note that the config overrides ( paths.init_tok2vec) in your case, should appear at the end.