First of all thanks for the excellent tool.
I'm new to spacy, prodigy and AI as a whole.
When I finished training in prodigy, I wanted to export a model to be used in spacy, and command
prodigy data-to-spacy ./export_to_spacy/teste.json --lang pt --ner mydataset--base-model pt_core_news_lg
is showing the following error:
Created and merged data for 43 total examples
Type Total Merged
---- ----- ------
NER 47 43
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/azureuser/prodigy/lib/python3.6/site-packages/prodigy/__main__.py", line 53, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 321, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/azureuser/prodigy/lib/python3.6/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/home/azureuser/prodigy/lib/python3.6/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/azureuser/prodigy/lib/python3.6/site-packages/prodigy/recipes/train.py", line 345, in data_to_spacy
json_data = [docs_to_json([doc], id=i) for i, doc in docs]
File "/home/azureuser/prodigy/lib/python3.6/site-packages/prodigy/recipes/train.py", line 345, in <listcomp>
json_data = [docs_to_json([doc], id=i) for i, doc in docs]
File "gold.pyx", line 881, in spacy.gold.docs_to_json
File "doc.pyx", line 652, in sents
ValueError: [E030] Sentence boundaries unset. You can add the 'sentenizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.
I've done a lot of research and still haven't figured out how to fix it.