Hi! I was wondering if it’s possible to export training.jsonl & evaluation.jsonl to the output directory after creating a NER model from scratch. The model I exported has the following: meta.json
; ner/
; tokenizer
; vocab/
, and the import works great. Many thanks!
If you're using the train
recipe and don't have a dedicated evaluation set and just hold back a random sample, Prodigy currently doesn't save out the files again separately.
Once you're serious about training and evaluation, you can use a separate Prodigy dataset for your evaluation examples, and pass that in as the --eval-id
. This also makes your experiments more stable and repeatable, because you're always evaluating on the same data. You can later save out the training and evaluation set using the db-out
command.
If you use the data-to-spacy
recipe to convert your dataset to a JSON-formatted training file for spaCy, you can also specify an --eval-split
and Prodigy will shuffle the examples and save out 2 separate files: a training file and an evaluation set (e.g. if you set --eval-split 0.2
, 20% of examples will become the evaluation set).
That's awesome - thanks, @ines!