Update NER model with huggingface transformer

I'm working on training a 'ner' (named entity recognition) model using the Hugging Face microsoft/biogpt transformer. So far, things went smoothly during the initial training phase with my training and development datasets. I got pretty close to the accuracy and F-score I was hoping for.

Now, I'm trying to continue to train the model with another dataset. From what I've learned (from here) , I need to keep the ['transformer'] layer frozen and just focus on updating the 'ner' part. But, it's not working. Here's a snippet of my config file:

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","ner"]
batch_size = 128
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "microsoft/biogpt"
mixed_precision = false
tokenizer_config = {"use_fast": true}

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 64
stride = 32

[components.transformer.set_extra_annotations]
@annotation_setters = "spacy-transformers.null_annotation_setter.v1"

[components.transformer.model.grad_scaler_config]


[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 500
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.components.ner]

[initialize.components.ner.labels]
@readers = "spacy.read_labels.v1"
path = "outDB/labels/ner.json"

[initialize.tokenizer]

If you've got any tips or insights to share, I'd be super grateful!

hi @shahryary,

Thanks for your question.

Could you post your question on the spaCy GitHub discussions forum? This forum is for Prodigy-specific questions. Your question looks like it's specific to spaCy.

At first glance, I suspect you'll need to add transformer to your frozen_components. I've found a related issue in GH on a way to do this:

Be sure to provide your spaCy version (spacy info) and the output from spacy debug config too as I expect the spaCy team will immediately ask for this.

1 Like

Hi there.

Since this thread discusses the training of NER transformer models, I figured I might ping and make anybody who is reading this in the future aware that we now have a Prodigy-HF plugin that comes with a hf.train.ner recipe. This recipe allows you to train a tranformer model directly. That means that, opposed to a spaCy pipeline, you'll get a specific model that only does NER. But it has been a common feature request because a lot of folks seem interested in training on top of very specific pre-trained transformer models that spaCy may not directly support.

If you want to learn more you can check out our updated docs. If you want to customise the training further you can also use these recipes as a place to start and customise.