Hi,
I am trying to carry out NER using prodigy.
I have my data labelled and imported into prodigy's db.
I am using prodigy train to carry out.
However, the issue is I only see a fraction of my data being used for training.
see below (see bold font for the issue):
=========================== Initializing pipeline ===========================
[2021-10-15 19:08:44,570] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 12311 | Evaluation: 1224 (from datasets)
Training: 2241 | Evaluation: 1224
Labels: ner (13)
[2021-10-15 19:08:55,148] [INFO] Pipeline: ['tok2vec', 'ner']
[2021-10-15 19:08:55,148] [INFO] Created vocabulary
[2021-10-15 19:08:55,148] [INFO] Finished initializing nlp object
[2021-10-15 19:09:02,445] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
Initialized pipeline
============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 12311 | Evaluation: 1224 (from datasets)
Training: 2241 | Evaluation: 1224
Labels: ner (13)
Pipeline: ['tok2vec', 'ner']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
As you can see it is only using 2241 samples out of the 12311 i am providing.
Would really appreciate if I could get some help with this
Thanks