Problem using prodigy sense2vec.teach

Hello,

when trying to create vector with the following command:

prodigy sense2vec.teach s2v_software_terms ./assets/RC_2006-11 --seeds "windows server, windows 10, ios 12, android 12, appache kafka 1"

I get the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/sense2vec/prodigy_recipes.py", line 54, in teach
    s2v = Sense2Vec().from_disk(vectors_path)
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/sense2vec/sense2vec.py", line 342, in from_disk
    self.vectors = Vectors().from_disk(path)
  File "spacy/vectors.pyx", line 616, in spacy.vectors.Vectors.from_disk
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/spacy/util.py", line 1299, in from_disk
    reader(path / key)
  File "spacy/vectors.pyx", line 609, in spacy.vectors.Vectors.from_disk.lambda8
  File "spacy/strings.pyx", line 238, in spacy.strings.StringStore.from_disk
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/srsly/_json_api.py", line 51, in read_json
    file_path = force_path(path)
  File "/home/ms/dev/python/nlp_cyber/spacy/lib/python3.8/site-packages/srsly/util.py", line 24, in force_path
    raise ValueError(f"Can't read file: {location}")
ValueError: Can't read file: assets/RC_2006-11/strings.json

I wonder why it is trying to open a file called strings.json? Maybe that is the problem. But why?

Thank you for responding soon, so we are able to continue our work.

Software used:
spaCy version 3.2.4
Platform Linux-5.17.0-051700-generic-x86_64-with-glibc2.29
Python version 3.8.10
Prodigy 1.11.7

Best,
Martin

Hi there!

When I look at the recipe definition then I can confirm that the second argument is referring to the path to pretrained sense2vec vectors. Did you use one of the pre-trained vector files found here? If I recall correctly, the strings.json is metadata that's packaged with the vectors.

Note, the sense2vec.teach recipe won't create vectors. Rather, it will create a terminology list by leveraging the sense2vec vectors.

Hello,

thank you for responding. I used also reddit data, but not the one from your link. The source was mentioned in some tutorial. So does the data has to be of a certain structure inside? Can I only use the files you linked to?

Best,
Martin

You need to point to pre-trained sense2vec vectors which come in a predefined format. You can train these yourself if you want to, but I would generally advise using the ones that the project provides.

Thank you, now it works.