How to convert prodigy dataset to .spacy object?

I'm trying to use the new spacy-nightly training pipeline. How can I convert a prodigy dataset to .spacy object? Thanks,

You can use Prodigy's data-to-spacy to conver one or more datasets to a JSON training file and then convert that to .spacy using spacy convert. Also see here:

Under the hood, the .spacy format is just a serialized DocBin object, so for full flexibility, you can also create Doc objects from your existing annotations and add them to a DocBin. See here for details: https://nightly.spacy.io/api/docbin

Hello guys,

I have a problem to convert the dataset to .spacy.

I labelled the data with this code:
python -m prodigy rel.manual koText en_core_web_sm koText.json --label INVESTS,RECEIVES,ACQUIRES --span-label ORG,MONEY,PERSON,REGEX --add-ents –wrap.

And now I wanted to convert everything to spacy with this code:
python -m prodigy data-to-spacy ./hp --ner allnoTitles --eval-split 0
(The new merged dataset is called allnoTitles)

Now I get this error:
============================== Generating data ==============================
Components: ner
Merging training and evaluation data for 1 components

  • [ner] Training: 1715 | Evaluation: 0 (0% split)
    ✘ Invalid data for component 'ner'

spans -> 16 -> start field required
spans -> 16 -> end field required

Why does it not work?
Kind regards.

Hi @do12siwu!

It's likely something is wrong with the data. Check out these related posts:

They seemed to find cleaning up annotations helped and then found a related issue dealing with their config file.

Could you try to export the annotations db-out to .jsonl to inspect and find the span? Perhaps try to remove it.

Let us know if this helps!

Thanks a lot.
I just used this code from Ines.

It worked!

1 Like

Hi guys,

I converted the dataset to spacy with this code:
python -m prodigy data-to-spacy ./hp --ner allnoTitles_filtered --eval-split 0

And before that, I annotated the dataset with this code:
python -m prodigy rel.manual koText en_core_web_sm koText.json --label INVESTS,RECEIVES,ACQUIRES --span-label ORG,MONEY,PERSON,REGEX --add-ents –wrap.

The problem is that the .spacy file contain the span-label ORG,MONEY,PERSON,REGEX, but not the relations INVESTS,RECEIVES,ACQUIRES.

I already checked the documentation.
What am I doing wrong?

Kind regards

hi @do12siwu!

This post explains that relations annotations aren't part of data-to-spacy: