I have three custom datasets for the Parser, Tagger, and NER that were generated using Prodigy. Now, I want to use them to evaluate Parser, Tagger, and NER in the en_core_web_trf model. Is there a way to do it without using Prodigy? The data format in my sets looks very different from the one described in SpaCy's Example class.
Hi! You can use the data-to-spacy recipe to export your annotations, which in Prodigy v1.10 will give you a corpus in spaCy's v2's JSON format. If you're using spaCy v3, you can run spacy convert to convert it to the binary format used by spacy train: https://spacy.io/api/cli#convert You'll then be able to train and evaluate your model using a transformer-based config.
Btw, under the hood, spaCy v3's binary format is just a collection of annotated Doc objects (which now also makes it much easier to generate it programmatically): https://spacy.io/api/data-formats#binary-training
The upcoming Prodigy v1.11, currently available as a nightly pre-release, will allow you to export your data in spaCy's .spacy format out-of-the-box.
Yes, the spacy convert command expects data in spaCy's JSON format for training, not just the raw annotations you've exported with Prodigy. You can create it using the data-to-spacy command, which will merge your annotations from different datasets and export a corpus in spaCy's format.