Exporting a NER model with training.jsonl & evaluation.jsonl

Hi! I was wondering if it’s possible to export training.jsonl & evaluation.jsonl to the output directory after creating a NER model from scratch. The model I exported has the following: meta.json; ner/; tokenizer; vocab/, and the import works great. Many thanks!

If you're using the train recipe and don't have a dedicated evaluation set and just hold back a random sample, Prodigy currently doesn't save out the files again separately.

Once you're serious about training and evaluation, you can use a separate Prodigy dataset for your evaluation examples, and pass that in as the --eval-id. This also makes your experiments more stable and repeatable, because you're always evaluating on the same data. You can later save out the training and evaluation set using the db-out command.

If you use the data-to-spacy recipe to convert your dataset to a JSON-formatted training file for spaCy, you can also specify an --eval-split and Prodigy will shuffle the examples and save out 2 separate files: a training file and an evaluation set (e.g. if you set --eval-split 0.2, 20% of examples will become the evaluation set).

That's awesome - thanks, @ines!

1 Like