I haven't been able to find a tutorial on how to use the prodigy annotated data in python - e.g the train.spacy and dev.spacy files (after using data-to-spacy). I just want to create a train_data and test_data list (or whatever) that i can train a spacy model with (using nlp.update) - is this possible? While I love the prodigy program, I'm not a fan of operating the training via the terminal.
Hi! Under the hood, the .spacy files are just serialized DocBins, so you can always load them back from disk and get a list of spaCy Doc objects: https://spacy.io/api/docbin#from_disk
That said, instead of implementing your own training loop in v3, we'd always recommend going via spaCy's training utilities because it'd otherwise be very difficult to get good results. There are just a lot of settings you need to get right in order to achieve optimal performance, and you wouldn't want to do all of this manually. If you really don't want to use the CLI, you can always call into spaCy's helpers yourself: