Retrain trained model with new dataset

Mohammad · January 20, 2023, 6:50am

How I can retrain the model with new dataset to improve the model score.

koaning · January 20, 2023, 10:04am

Assuming there is new training data, you can rerun the train command to train on more data. Typically this improves the performance of the model.

Note that the "score" of the model does depend on your validation set as well. If you don't have a seperate validation set, then Prodigy will automatically generate one during the train procedure. More information on this can be found on the docs.

Mohammad · January 20, 2023, 2:02pm

Can you please provide a example of commends to run to retraining on exist model or I have to go back the run span.manual from scratch. Please find below commends I runned to create and train the model.

python3 -m prodigy spans.manual CV_DOBV1 blank:en ./CV_json_English_Format_4.jsonl --label DOB

python3 -m prodigy train ./CV_DOBV1 --ner CV_DOBV1 --eval-split 0.25

Now I have new data file "12.01.23_100_Cv_Format.jsonl" and I want retrain the model again on the new data.

koaning · January 20, 2023, 3:19pm

You can use markdown syntax to highlight your code segments, which makes it easier to read/copy/paste on this forum.

That said, the aforementioned train command can pick up where another model left off. This can be done via --base-model.

python -m prodigy train ... --base-model <path-to-your-model>

Does this not work for you?

Detail

Note that you can also use a pretrained spaCy model here, which is a common starting point. You can do that via:

python -m prodigy train ... --base-model en_core_web_lg

Mohammad · January 23, 2023, 9:37am

prodigy train --ner New_model --base-model CV_DOBV1/model-last/

I got an error below :

Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/prodigy/main.py", line 61, in
controller = recipe(args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 364, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/usr/local/lib/python3.8/dist-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.8/dist-packages/plac_core.py", line 232, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File "/usr/local/lib/python3.8/dist-packages/prodigy/recipes/train.py", line 278, in train
return _train(
File "/usr/local/lib/python3.8/dist-packages/prodigy/recipes/train.py", line 198, in _train
spacy_train(nlp, output_path, use_gpu=gpu_id, stdout=stdout)
File "/usr/local/lib/python3.8/dist-packages/spacy/training/loop.py", line 122, in train
raise e
File "/usr/local/lib/python3.8/dist-packages/spacy/training/loop.py", line 105, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "/usr/local/lib/python3.8/dist-packages/spacy/training/loop.py", line 200, in train_while_improving
for step, (epoch, batch) in enumerate(train_data):
File "/usr/local/lib/python3.8/dist-packages/spacy/training/loop.py", line 316, in create_train_batches
raise ValueError(Errors.E986)
ValueError: [E986] Could not create any training batches: check your input. Are the train and dev paths defined? Is discard_oversize set appropriately?

koaning · January 23, 2023, 9:56am

That's strange.

I'm wondering if there's something wrong with the trained model you're trying to improve apon.

Just to check, does this command run for you?

prodigy train --ner <your-dataset> --base-model en_core_web_md

Note that the en_core_web_md model should be downloaded beforehand, which you can do via:

python -m spacy download en_core_web_md

Topic		Replies	Views
After NER.correct, how do I train? ner , spacy , training	6	579	June 14, 2023
NER prodigy train with existing model usage , ner , spacy , solved	7	844	September 28, 2020
Help updating spaCy v2 model usage , spacy	5	401	December 15, 2021
Commands for training NER-Model in prodigy usage , ner , solved , training	9	1141	January 9, 2023
How to reuse the prodigy.db to retrain the older (spacy v2) ner custom model usage , ner , database , spacy , custom	8	585	December 5, 2022

Retrain trained model with new dataset

Detail

Related topics