Hello,
I can’t get the “ner.make-gold” command to work on the en_vectors_web_lg or a custom model trained on en_vectors_web_lg.
Using other models like en_core_web_lg works perfectly, just like using en_vectors_web_lg with other commands such as ner.teach works.
Here is the error I got:
Traceback (most recent call last):
File "/home/debian/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/debian/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/debian/anaconda3/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 178, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src/prodigy/core.pyx", line 55, in prodigy.core.Controller.__init__
File "/home/debian/anaconda3/lib/python3.6/site-packages/toolz/itertoolz.py", line 368, in first
return next(iter(seq))
File "cython_src/prodigy/core.pyx", line 84, in iter_tasks
File "/home/debian/anaconda3/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 209, in make_tasks
for doc, eg in nlp.pipe(texts, as_tuples=True):
File "/home/debian/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 554, in pipe
for doc, context in izip(docs, contexts):
File "/home/debian/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 578, in pipe
for doc in docs:
File "nn_parser.pyx", line 367, in pipe
File "cytoolz/itertoolz.pyx", line 1047, in cytoolz.itertoolz.partition_all.__next__
File "/home/debian/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 557, in <genexpr>
docs = (self.make_doc(text) for text in texts)
File "/home/debian/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 550, in <genexpr>
texts = (tc[0] for tc in text_context1)
File "/home/debian/anaconda3/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 208, in <genexpr>
texts = ((eg['text'], eg) for eg in stream)
File "cython_src/prodigy/components/preprocess.pyx", line 106, in add_tokens
File "cython_src/prodigy/components/preprocess.pyx", line 40, in split_sentences
File "doc.pyx", line 528, in __get__
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.
Thank you for your help !