I’m trying to build a model using only annotations from
ner.make-gold. The logic was:
- do some examples of GOLD
- train a model with batch train using
- then repeat the first step with the new model and suggestions get better (no need for too much manual interventions)
Commands and output:
> prodigy ner.batch-train personal_info_gold_new prodigy_models/personal_info_gold_new2 -o prodigy_models/personal_info_gold_new3 --n-iter 10 --eval-split 0.2 --dropout 0.2 --no-missing
Added 650 annotations
> prodigy ner.batch-train personal_info_gold_new prodigy_models/personal_info_gold_new3 -o prodigy_models/personal_info_gold_new4 --n-iter 10 --eval-split 0.2 --dropout 0.2 --no-missing
So, until this iteration the results were getting better then suddenly started to get worse.
What could be the problem here?
Just to confirm: It looks like you’re using one dataset,
personal_info_gold_new and then keep updating the model artifact produced in the previous step, right? Can you reproduce the same results if you’re always updating the base model (e.g.
en_core_web_sm or whatever else you used)?
personal_info_gold_new is the dataset I created to save the gold data I annotate in each iteration…
I started annotating using the model that I generated using
terms.train-vectors (prodigy_models/resumes_model1) like this:
prodigy ner.make-gold personal_info_gold_new prodigy_models/resumes_model1 data/jsonl/en_complete_316.jsonl --label "PERSON, EMAIL, BIRTH_DATE, PHONE_NUMBER, SOCIAL_MEDIA"
Then on batch-train I used again the same model (from which I think only the tokenizer is used):
prodigy ner.batch-train personal_info_gold_new prodigy_models/resumes_model1 -o prodigy_models/personal_info_gold_new --n-iter 10 --eval-split 0.2 --dropout 0.2
So, if I am understanding correctly you are asking if I can reproduce the same results if I stay with the first model which in this case would be: prodigy_models/resumes_model1 in all the iterations to come?
Yes, exactly. Since you’re always updating with the full gold dataset, the result should be the same.
Yes, that was also what I thought but I got this:
prodigy ner.batch-train personal_info_gold_new prodigy_models/resumes_model1 -o prodigy_models/personal_info_gold_new4 --n-iter 10 --eval-split 0.2 --dropout 0.2 --no-missing
and now I am confused…