How to declare and use validation set in ner.train

saad.moosa · December 12, 2021, 7:55am

Hello,
I am trying to train a model as follows:

!Python -m prodigy train ./tmp_model --ner food_annotations --base-model en_core_web_lg --eval-split 0.10 --config config.cfg

I would like to have a 60/30/10 split for train/validation/evaluation, so that I can train the model on 60% of the data, then run validation on 30% of the data to generalize it and then evaluate the model on 10% of the data. Is there any way to define a validation set in this recipe? Is there another recipe that I can do this with and how would I do that? I have looked through the documentation and other support questions but haven't found an answer, I might not have been able to properly structure my query to find the right previous support question, so I thought I would ask.

Thank you for your help!

ljvmiranda921 · December 13, 2021, 12:24am

Hi @saad.moosa

You can do it via the --eval-split setting. Prodigy should automatically do the split.

But if you want to also split off a portion for testing/validation and serious about training, then you might need separate datasets. You can export the data and split it in which way you like, and even re-import it into new datasets.

myeghaneh · January 12, 2022, 1:59am

still, have some questions about...how we can do "cross-validation " by Prodigy? (specially in span categorize)

ljvmiranda921 · January 12, 2022, 6:20am

Hi @myeghaneh , spaCy v3 (in which Prodigy calls under the hood) doesn't have a built-in cross-validation scheme. Ideally, you'd want a true random sample of your data to test upon.

Topic		Replies	Views
Handling train / dev / test in Prodigy usage , ner , training	3	579	July 22, 2021
Validation within Prodigy (Cross Validation) usage , textcat	4	805	November 7, 2019
NER: CLI command for Validation set usage , ner , spacy	2	416	September 16, 2020
Create baseline metrics based on manual NER annotations usage , ner , solved	3	669	June 8, 2020
stratitifed sampling usage , solved	1	447	May 18, 2020

How to declare and use validation set in ner.train

Related topics