Is there some documentation about the model-best or model-last folders created in output folder after custom NER training. For example the NER, transformer folders, what are they and what do they refer to?
hi @arda!
Great question! This is a good point that there isn't much documentation. However, this StackOverflow summarizes the difference well:
-
model-best
is the model that got the highest score on the dev set. It is usually the model you would want to use. -
model-last
is the model trained in the last iteration. You might want to use it if you resume training.
Since these folders are based on spacy train
(which Prodigy's prodigy train
is a wrapper for), there are helpful discussions on spaCy's Github Discussion forum like these on resuming training:
But I would suggest reading through these posts too to understand what's the point/purpose of retraining (i.e., using model-last
):
Let us know if you have any other questions!
By highest score do you mean F-score?
Hi @stefan.bartell,
By default with only a ner
component, then yes.
But this can be configured and may require a little explanation.
When your run prodigy train
, you're really running spacy train
. To run spacy train
, you need a spaCy config file. To simplify this, prodigy train
will use spaCy's default config, very similar to running:
python -m spacy init config config.cfg --lang en --pipeline ner
This then produces this config file:
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = null
seed = 0
[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100
[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null
[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"
[components.tok2vec]
factory = "tok2vec"
[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"
[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,1000,2500,2500]
include_static_vectors = false
[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 96
depth = 4
window_size = 1
maxout_pieces = 3
[corpora]
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null
[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null
[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0
[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001
[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null
[pretraining]
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
The relevant part is:
[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null
For more background, here's spaCy's docs on the training weights. Essentially, if you wanted, you could modify the weights. For example, if you were only concerned about Recall, you could change ents_r = 1.0
and ents_f = 0.0
so that automatically spaCy will select as your best model the one that maximizes recall instead of F1.
This part in the docs explain more:
At the end of your training process, you typically want to select the best model – but what “best” means depends on the available components and your specific use case. For instance, you may prefer a pipeline with higher NER and lower POS tagging accuracy over a pipeline with lower NER and higher POS accuracy. You can express this preference in the score weights, e.g. by assigning
ents_f
(NER F-score) a higher weight.
Essentially, the answer to your question may change depending on what components you have in your pipeline. For example, if you want to have both a ner
and textcat
component, then underneath prodigy train
starts with a config that's similar to running
python -m spacy init config config.cfg --lang en --pipeline ner,textcat
which instead has training weights like:
[training.score_weights]
ents_f = 0.5
ents_p = 0.0
ents_r = 0.0
ents_per_type = null
cats_score = 0.5
cats_score_desc = null
cats_micro_p = null
cats_micro_r = null
cats_micro_f = null
cats_macro_p = null
cats_macro_r = null
cats_macro_f = null
cats_macro_auc = null
cats_f_per_type = null
Hope this helps!