Multiple models or one single model?

santoshbs · February 21, 2021, 4:09pm

I just finished saving a model with one label (based on 600+ manual annotations of text) using the following command:

prodigy train ner
my_annotated_data_1
en_vectors_web_lg
--init-tok2vec ../tok2vec_cd8_model289.bin
--output ./my_model_1
--eval-split 0.2`

Having obtained an F-score of >95, I now would like to add a second label through the same steps of ner.manual and prodigy train.

I am not sure if should create a separate annotation dataset my_annotated_data_2 with the second label -

and then train and save a separate model my_model_2; or
but train and save on the same model my_model_1 by providing both my_annotated_data_1 and my_annotated_data_2 as comma separated datasets to the prodigy train recipe

Not sure which of these is a better practice and would help achieve the most accurate results. Is there a third alternative?

ines · February 22, 2021, 1:12am

Hi! If your goal is to have one pipeline predicting both labels, this is definitely the approach I would recommend The presence and absence of one label can always be relevant for all other labels as wel, since the entity recognizer predicts token-based tags, and named entities can't overlap.

It'll definitely be interesting to run different experiments here, though and compare the per-label evaluation scores of the joint model to models trained separately on only one label at a time. If there's a big difference here, this could point to potential problems and conflicts in the data.

A workflow you probably want to avoid is updating the same trained artifact multiple times with different datasets and different labels. This will make the process and the results much harder to reason about, and you're risking forgetting effects at every step.

santoshbs · February 22, 2021, 8:15am

Thank you so much, @ines. I am intending to experiment with combination and separate models.

Topic		Replies	Views
Merging single label-based models into one multiple label-model usage , ner , solved	3	1077	June 10, 2020
Combining two separate datasets into a single trained model ner	2	254	December 6, 2023
Best practice for merging multiple NER datasets into one . usage , ner	1	780	November 30, 2021
ner.teach one label at a time usage , ner	2	366	August 30, 2021
Combining Models usage , ner	2	452	July 8, 2020

Multiple models or one single model?

Related topics