After training a model on 200 examples, I can't do binary teach, when running this I am getting an error, any ideas what's going wrong? (search doesn't help, not sure exactly what's wrong in my dataset)
prodigy ner.teach ner_st2_skills ./model/model-best ./data.jsonl --label SKILL
Using 1 label(s): SKILL
Traceback (most recent call last):
File "/Users/fed/.pyenv/versions/3.9.2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/fed/.pyenv/versions/3.9.2/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy/__main__.py", line 61, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy/recipes/ner.py", line 71, in teach
model = EntityRecognizer(nlp, label=label)
File "cython_src/prodigy/models/ner.pyx", line 340, in prodigy.models.ner.EntityRecognizer.__init__
File "cython_src/prodigy/util.pyx", line 621, in prodigy.util.copy_nlp
File "spacy/vocab.pyx", line 90, in spacy.vocab.Vocab.vectors.__set__
AttributeError: 'NoneType' object has no attribute 'strings'
My data.jsonl was mapped using PhraseMatch like this (from previously labeled data)
for obj in data:
doc = nlp(obj['text'])
matcher = PhraseMatcher(nlp.vocab, attr="LOWER")
matcher.add("SKILL", [nlp.make_doc(cls['value']) for cls in filterSkillsByConfidence(skills[obj['meta']['listingId']])])
matches = matcher(doc)
entities = list()
for match_id, start, end in matches:
entities.append(Span(doc, start, end, label='SKILL'))
doc.ents = spacy.util.filter_spans(entities)
obj["spans"] = [{"token_start": ent.start,
"token_end": ent.end - 1,
"start": ent.start_char,
"end": ent.end_char,
"text": ent.text,
"label": ent.label_} for ent in doc.ents]
And whenever trying to:
poetry run python -m prodigy ner.correct ner_st2_skills ./model/model-best ./data.jsonl --label SKILL
The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.