I'm using the data-to-spacy
receipt to convert some datasets into a format for training with spaCy. The --eval-split
argument doesn't seem to work, though. No matter what I specify, I'm getting an even 50/50 split. Is there some trick to getting a different split?
Here's the command I'm running and its output.
prodigy data-to-spacy \
--ner SupplierCatalog_10000-aa-0_0_1,SupplierCatalog_10000-ab-0_0_1,SupplierCatalog_10000-ac-0_0_1,SupplierCatalog_10000-ad-0_0_1,SupplierCatalog_10000-ae-0_0_1 \
--eval-split 0.2 \
SupplierCatalog_10000-0_0_1-train.json \
SupplierCatalog_10000-0_0_1-eval.json
ℹ Starting with language en
Created and merged data for 833 total examples
Type Total Merged
---- ----- ------
NER 850 833
Using 417 train / 416 eval (split 50%)
✔ Saved 416 examples to SupplierCatalog_10000-0_0_1-eval.json
✔ Saved 417 examples to SupplierCatalog_10000-0_0_1-train.json