Accessing probabilities in NER

honnibal · November 16, 2017, 1:42pm

Hi,

The answer to this question is a little bit involved. The short answer is this:


# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.
beam_width = 16
# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.
beam_density = 0.0001 
nlp = spacy.load('en_core_web_sm')

docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)

for doc, beam in zip(docs, beams):
    entity_scores = defaultdict(float)
    for score, ents in nlp.entity.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

Here’s the longer explanation. First, spaCy implements two different objectives for named entity parsing:

The greedy imitation learning objective. This objective asks, “Which of the available actions will introduce no new errors if I perform them from this state?” For instance, if we’ve gotten the first word of an entity wrong, and the next word is also not inside an entity, we’re agnostic about whether the fake entity continues, or closes immediately. This makes life easier for the model, because it means the correct action to take next is not defined by whether the current state is correct. There’s some notion of “sunk cost”, basically. This greedy imitation learning objective maximises the expected F1 score, but doesn’t do well at giving per-token probabilities. Once an entity has begun, we might be very confident that we should continue it. So we can’t get good probabilities out of the transition-scores produced for the greedy model.
The global beam-search objective. Instead of optimising the individual transition decisions, the global model asks whether the final parse is correct. To optimise this objective, we build the set of top-k most likely incorrect parses, and top-k most likely correct parses. We assume that the model assigns 0 weight to all parses outside these sets, and use the two sets to estimate the gradient of the loss. We then backprop through all the intermediate states. The beam search allows probabilities over entities to be estimated, because you have multiple analyses. The probability of some entity is then simply the sum of the scores of the parses containing it, normalised by the total score assigned to all parses in the beam.

You can use beam decoding with weights optimised using the greedy procedure. However, without doing any beam updates, the probabilities likely won’t be well calibrated — so the scores may or may not be useful for your application. In Prodigy, we start out with pretrained models, that have usually been optimised with the greedy procedure. During annotation the model will be updated using the beam updates, correcting the initial bias.

Topic		Replies	Views
Prediction or probability score for prediction results using ner model developed by ner.teach usage , ner , spacy	4	1382	May 1, 2020
Displaying a confidence score next to a user-defined entity usage , ner	20	10240	August 15, 2021
prefer_uncertain in ner.teach? docs , ner	2	1109	September 7, 2017
spaCy get confidence on entity classification? ner , spacy	2	3031	March 19, 2019
Ner evaluation probability threshold usage , ner , spacy	2	427	September 15, 2020

Accessing probabilities in NER

Related topics