I came across the terms.train-vectors recipe…and tried the following:
prodigy terms.train-vectors ./models raw.json
-spacy-model en_vectors_web_lg
-la en
-ME
-MN
my raw.json is a list of dicts with a “text” field. Here is my trace:
13:06:06 - ‘pattern’ package not found; tag filters are not available for English
13:06:06 - collecting all words and their counts
Traceback (most recent call last):
File “/home/haroon/anaconda3/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/home/haroon/anaconda3/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/home/haroon/anaconda3/lib/python3.6/site-packages/prodigy/main.py”, line 331, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 211, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/home/haroon/anaconda3/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/home/haroon/anaconda3/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File “/home/haroon/anaconda3/lib/python3.6/site-packages/prodigy/recipes/terms.py”, line 99, in train_vectors
negative=negative,
File “/home/haroon/anaconda3/lib/python3.6/site-packages/gensim/models/word2vec.py”, line 783, in init
fast_version=FAST_VERSION)
File “/home/haroon/anaconda3/lib/python3.6/site-packages/gensim/models/base_any2vec.py”, line 759, in init
self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
File “/home/haroon/anaconda3/lib/python3.6/site-packages/gensim/models/base_any2vec.py”, line 936, in build_vocab
sentences=sentences, corpus_file=corpus_file, progress_per=progress_per, trim_rule=trim_rule)
File “/home/haroon/anaconda3/lib/python3.6/site-packages/gensim/models/word2vec.py”, line 1591, in scan_vocab
total_words, corpus_count = self._scan_vocab(sentences, progress_per, trim_rule)
File “/home/haroon/anaconda3/lib/python3.6/site-packages/gensim/models/word2vec.py”, line 1560, in _scan_vocab
for sentence_no, sentence in enumerate(sentences):
File “/home/haroon/anaconda3/lib/python3.6/site-packages/prodigy/recipes/terms.py”, line 33, in iter
for sent in doc.sents:
AttributeError: ‘NoneType’ object has no attribute ‘sents’
Should I be sending nlp.doc objects? Not sure how I would do that from the command line. Any help appreciated.
Also the documentation on this recipe says it can be used with a sense2vec model. Would that mean using such a model (previously trained) for the --spacy-model argument?
And finally I assume that by training my vector model with merged entities and noun phrases that will NOT result in terms.teach with seeds resulting in multi-token questions? (what I am ultimately after) Since if I understand it correctly spacy/prodigy will tokenize my seeds on whitespaces.