--factor option appears to be ignored

wpm · April 3, 2018, 3:30am

The --factor option of ner.batch-train appears to be ignored.

	$ pgy ner.batch-train dataset en --output-model model --label LABEL --factor 1
	Using 1 labels: LABEL

	Loaded model en
	Using 50% of accept/reject examples (132) for evaluation
	Using 100% of remaining examples (1838) for training
	Dropout: 0.2  Batch size: 32  Iterations: 10  


	BEFORE     0.000     
	Correct    0
	Incorrect  82
	Entities   3872      
	Unknown    1         

	         
	#          LOSS       RIGHT      WRONG      ENTS       SKIP       ACCURACY  
	  0%|                                                                                                                                                | 0/1838 [00:00<?, ?it/s]

I would expect that with --factor 1 Prodigy would use all the data for training and none for validation.

This is version 1.4.0.

ines · April 3, 2018, 9:54am

Sorry if this has been confusing in the docs – I think what you’re looking for is the --eval-split, which is the percentage of examples to keep for evaluation. If set to 0, no evaluation examples are split off.

The --factor refers to the total number of examples to train from – e.g. only half of the data. It’s currently mostly used by the ner.train-curve recipe to quickly train on different portions of the data.

Topic		Replies	Views
Model Training & Dataset Exploration usage , ner	1	977	June 12, 2019
Interpreting ner.batch-train results usage , ner , solved	3	744	May 24, 2018
Understanding ner.batch-train stats usage , ner , solved , best-practices	7	2704	October 26, 2018
Debugging NER - batch_train with custom dataset ner	5	588	October 16, 2019
ner.train number of examples usage , ner	8	1941	August 3, 2018

--factor option appears to be ignored

Related topics