Explanation Custom NER model folders

arda · August 9, 2022, 10:18am

Is there some documentation about the model-best or model-last folders created in output folder after custom NER training. For example the NER, transformer folders, what are they and what do they refer to?

ryanwesslen · August 9, 2022, 3:06pm

hi @arda!

Great question! This is a good point that there isn't much documentation. However, this StackOverflow summarizes the difference well:

model-best is the model that got the highest score on the dev set. It is usually the model you would want to use.
model-last is the model trained in the last iteration. You might want to use it if you resume training.

Since these folders are based on spacy train (which Prodigy's prodigy train is a wrapper for), there are helpful discussions on spaCy's Github Discussion forum like these on resuming training:

But I would suggest reading through these posts too to understand what's the point/purpose of retraining (i.e., using model-last):

Let us know if you have any other questions!

stefan.bartell · March 5, 2023, 9:30pm

By highest score do you mean F-score?

ryanwesslen · March 6, 2023, 1:08pm

Hi @stefan.bartell,

By default with only a ner component, then yes.

But this can be configured and may require a little explanation.

When your run prodigy train, you're really running spacy train. To run spacy train, you need a spaCy config file. To simplify this, prodigy train will use spaCy's default config, very similar to running:

python -m spacy init config config.cfg --lang en --pipeline ner

This then produces this config file:

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = null
seed = 0

[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,1000,2500,2500]
include_static_vectors = false

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 96
depth = 4
window_size = 1
maxout_pieces = 3

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

The relevant part is:

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

For more background, here's spaCy's docs on the training weights. Essentially, if you wanted, you could modify the weights. For example, if you were only concerned about Recall, you could change ents_r = 1.0 and ents_f = 0.0 so that automatically spaCy will select as your best model the one that maximizes recall instead of F1.

This part in the docs explain more:

At the end of your training process, you typically want to select the best model – but what “best” means depends on the available components and your specific use case. For instance, you may prefer a pipeline with higher NER and lower POS tagging accuracy over a pipeline with lower NER and higher POS accuracy. You can express this preference in the score weights, e.g. by assigning ents_f (NER F-score) a higher weight.

Essentially, the answer to your question may change depending on what components you have in your pipeline. For example, if you want to have both a ner and textcat component, then underneath prodigy train starts with a config that's similar to running

python -m spacy init config config.cfg --lang en --pipeline ner,textcat

which instead has training weights like:

[training.score_weights]
ents_f = 0.5
ents_p = 0.0
ents_r = 0.0
ents_per_type = null
cats_score = 0.5
cats_score_desc = null
cats_micro_p = null
cats_micro_r = null
cats_micro_f = null
cats_macro_p = null
cats_macro_r = null
cats_macro_f = null
cats_macro_auc = null
cats_f_per_type = null

Hope this helps!

Topic		Replies	Views
Trained model location path usage , ner	5	575	May 11, 2022
After NER.correct, how do I train? ner , spacy , training	6	539	June 14, 2023
Record training results enhancement , ner	2	875	February 19, 2018
Train for NER ner , spacy , training	3	605	July 11, 2022
Remarkable Difference Between Prodigy and Custom Training Times ner	5	1440	April 4, 2018

Explanation Custom NER model folders

Related topics