Getting the list of mispredictions on evaluation dataset

inceatakan · November 14, 2020, 5:35pm

Hi there,

I am trying to fine-tune the dependency parser for 'noisy user-generated' data. I labeled ~500 sentences/fragments with:

!python -m prodigy dep.correct train_dependency en_core_web_lg ./train_data_dependency.txt

Then, I re-train the en_core_web_lg as follows:

!python -m prodigy train parser new_set45 en_core_web_lg --output ./parser_model --eval-split 0.2 -n 10

How do I get the predictions on the evaluation dataset so that I can put more data similar to the failing examples in the eval dataset and retrain the model?

I am using Prodigy 1.10.4

Yours,
-Atakan

ines · November 16, 2020, 12:59am

Hi! You should be able to extract that by just running your model over the data: load the model you trained (./parser_model) in spaCy in a script, notebook etc., load in your texts from the dataset and get the dependency labels and heads. Then compare those to the labels and heads in the data. If they're different, the model's prediction was wrong, and you can output the example, and see if you can spot patterns, like certain text types or constructions.

inceatakan · November 16, 2020, 11:31pm

Hi Ines,

Thank you very much for your response. After I find mis-parsed sentences, I add data similar to them in the training data and label them:

!python -m prodigy dep.correct train_dependency ./parser_model ./train_data_dependency.txt

and then retrain the ./parser_model as follows:

!python -m prodigy train parser new_set45 ./parser_model --output ./parser_model --eval-split 0.2 -n 10

Is that right?

Yours,
-Atakan

ines · November 17, 2020, 8:09am

Yes, that's correct. When you re-train, you should start with the base model, en_core_web_lg, and all annotations, though, and train from scratch. That's cleaner than using the trained artifact.

If you're not doing this already, it might also be a good idea now to use a dedicated and separate evaluation set (instead of just holding back 20% of the training examples for evaluation). If you're always evaluating on the same examples, you'll actually be able to properly compare the results between runs and get a better idea of whether your model is improving. (Just make sure you're not accidentally adding any of the mis-predicted evaluation examples to the training data )

inceatakan · November 17, 2020, 7:31pm

Thank you very much Ines!

Topic		Replies	Views
Correcting trained model fails. ner , solved , transformers , training	4	844	January 24, 2022
Detailed evaluation of NER model trained from Prodigy annotations usage , ner , training	6	716	December 14, 2021
How to evaluate the model accuracy with test data (not part of training) usage , ner , spacy	8	722	March 12, 2024
how to test my model on new dataset ? usage , spacy , solved	2	943	April 26, 2020
data-to-spacy training examples also in evaluation data database , spacy , to-be-released	8	1591	January 21, 2022

Getting the list of mispredictions on evaluation dataset

Related topics