SpaCy3 models evaluation on a custom dataset

LubaTovbin · July 6, 2021, 12:48am

Hi guys,

I have three custom datasets for the Parser, Tagger, and NER that were generated using Prodigy. Now, I want to use them to evaluate Parser, Tagger, and NER in the en_core_web_trf model. Is there a way to do it without using Prodigy? The data format in my sets looks very different from the one described in SpaCy's Example class.

Any guidance is appreciated,
Thank you

ines · July 6, 2021, 1:26am

Hi! You can use the data-to-spacy recipe to export your annotations, which in Prodigy v1.10 will give you a corpus in spaCy's v2's JSON format. If you're using spaCy v3, you can run spacy convert to convert it to the binary format used by spacy train: https://spacy.io/api/cli#convert You'll then be able to train and evaluate your model using a transformer-based config.

Btw, under the hood, spaCy v3's binary format is just a collection of annotated Doc objects (which now also makes it much easier to generate it programmatically): https://spacy.io/api/data-formats#binary-training

The upcoming Prodigy v1.11, currently available as a nightly pre-release, will allow you to export your data in spaCy's .spacy format out-of-the-box.

LubaTovbin · July 6, 2021, 8:43pm

Hi Ines,

Thanks for the response!
My data is in JSONL format. The "spacy convert" doesn't work on JSONL, am I wrong?

ines · July 7, 2021, 1:13am

Yes, the spacy convert command expects data in spaCy's JSON format for training, not just the raw annotations you've exported with Prodigy. You can create it using the data-to-spacy command, which will merge your annotations from different datasets and export a corpus in spaCy's format.

Topic		Replies	Views
Formatting Prodigy annotations for evaluation of external NER models using spaCy usage , ner , spacy	4	592	April 13, 2022
SpaCy training from data-to-spacy output ? usage , spacy	8	1808	June 14, 2022
Exporting a NER model with training.jsonl & evaluation.jsonl ner , spacy , solved	2	648	June 2, 2020
Training prodigy ner data through spacy usage , ner , spacy , solved	3	892	January 8, 2020
Converting SpaCy training json file to Prodigy jsonl format usage , spacy	9	3011	April 17, 2023

SpaCy3 models evaluation on a custom dataset

Related topics