Accessing probabilities in NER

I assume that when spaCy runs NER on a document it generates a list of tokens (or token combinations, e.g. ‘United Kingdom’) from that document with a probability score / confidence level that the token is an entity of a given type. I imagine that if the probability score is above a certain threshold then it is marked as an entity of that type and flagged for the user to see, otherwise it remains submerged in a sea of unflagged variables that spaCy helpfully keeps from the user to avoid confusion / information overload. However, I’m quite interested in seeing what those probability scores are, since I want to compare it with the results from another classifier. Is it possible to access those probability scores somehow?

I realise that this isn’t strictly speaking a prodigy-related question, but I remember reading somewhere that when running prodigy it first confronts the user with the cases that it (prodigy) is most unsure of. I imagine this means that the first examples I’m confronted with for a given text are the ones where prodigy thinks that for this token there’s a 51% chance it’s an organisation, 47% chance it’s a product and a 2% chance it’s a person. Or something like that. So, that’s why I’m asking here (also, I didn’t get an answer on Stackoverflow :wink: ).

1 Like


The answer to this question is a little bit involved. The short answer is this:

# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.
beam_width = 16
# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.
beam_density = 0.0001 
nlp = spacy.load('en_core_web_sm')

docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)

for doc, beam in zip(docs, beams):
    entity_scores = defaultdict(float)
    for score, ents in nlp.entity.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

Here’s the longer explanation. First, spaCy implements two different objectives for named entity parsing:

  1. The greedy imitation learning objective. This objective asks, “Which of the available actions will introduce no new errors if I perform them from this state?” For instance, if we’ve gotten the first word of an entity wrong, and the next word is also not inside an entity, we’re agnostic about whether the fake entity continues, or closes immediately. This makes life easier for the model, because it means the correct action to take next is not defined by whether the current state is correct. There’s some notion of “sunk cost”, basically. This greedy imitation learning objective maximises the expected F1 score, but doesn’t do well at giving per-token probabilities. Once an entity has begun, we might be very confident that we should continue it. So we can’t get good probabilities out of the transition-scores produced for the greedy model.

  2. The global beam-search objective. Instead of optimising the individual transition decisions, the global model asks whether the final parse is correct. To optimise this objective, we build the set of top-k most likely incorrect parses, and top-k most likely correct parses. We assume that the model assigns 0 weight to all parses outside these sets, and use the two sets to estimate the gradient of the loss. We then backprop through all the intermediate states. The beam search allows probabilities over entities to be estimated, because you have multiple analyses. The probability of some entity is then simply the sum of the scores of the parses containing it, normalised by the total score assigned to all parses in the beam.

You can use beam decoding with weights optimised using the greedy procedure. However, without doing any beam updates, the probabilities likely won’t be well calibrated — so the scores may or may not be useful for your application. In Prodigy, we start out with pretrained models, that have usually been optimised with the greedy procedure. During annotation the model will be updated using the beam updates, correcting the initial bias.


Hey Matt. Thanks for the quick answer. Didn’t realise this was so involved :-/

And to be clear, the function is nlp.entity.moves.get_beam_parses, right? (You have nlp.entity.get_beam_parses above.)

Also, what is the relationship between this algorithm and prodigy.models.ner.EntityRecognizer.make_best?