I'm trying to use the new spacy-nightly training pipeline. How can I convert a prodigy dataset to .spacy object? Thanks,
You can use Prodigy's data-to-spacy
to conver one or more datasets to a JSON training file and then convert that to .spacy
using spacy convert
. Also see here:
Under the hood, the .spacy
format is just a serialized DocBin
object, so for full flexibility, you can also create Doc
objects from your existing annotations and add them to a DocBin
. See here for details: https://nightly.spacy.io/api/docbin
Hello guys,
I have a problem to convert the dataset to .spacy.
I labelled the data with this code:
python -m prodigy rel.manual koText en_core_web_sm koText.json --label INVESTS,RECEIVES,ACQUIRES --span-label ORG,MONEY,PERSON,REGEX --add-ents –wrap.
And now I wanted to convert everything to spacy with this code:
python -m prodigy data-to-spacy ./hp --ner allnoTitles --eval-split 0
(The new merged dataset is called allnoTitles)
Now I get this error:
============================== Generating data ==============================
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 1715 | Evaluation: 0 (0% split)
✘ Invalid data for component 'ner'
spans -> 16 -> start field required
spans -> 16 -> end field required
Why does it not work?
Kind regards.
Hi @do12siwu!
It's likely something is wrong with the data. Check out these related posts:
They seemed to find cleaning up annotations helped and then found a related issue dealing with their config file.
Could you try to export the annotations db-out
to .jsonl
to inspect and find the span? Perhaps try to remove it.
Let us know if this helps!
Thanks a lot.
I just used this code from Ines.
It worked!
Hi guys,
I converted the dataset to spacy with this code:
python -m prodigy data-to-spacy ./hp --ner allnoTitles_filtered --eval-split 0
And before that, I annotated the dataset with this code:
python -m prodigy rel.manual koText en_core_web_sm koText.json --label INVESTS,RECEIVES,ACQUIRES --span-label ORG,MONEY,PERSON,REGEX --add-ents –wrap.
The problem is that the .spacy file contain the span-label ORG,MONEY,PERSON,REGEX, but not the relations INVESTS,RECEIVES,ACQUIRES.
I already checked the documentation.
What am I doing wrong?
Kind regards
hi @do12siwu!
This post explains that relations
annotations aren't part of data-to-spacy
: