Thanks! We’re glad you like it – we’re pretty happy with how it’s shaping up! That said, there’s still lots of tweaking and testing to do, so we’re glad to have you testing it

The sorting dynamics are something that can particularly benefit from more “playtesting”. It’s possible there are some underlying bugs in the spaCy beam-search code that’s backing the NER probability estimates, too. I’ll explain a little about how this works so you can do some digging, to figure out where the problem might be in your specific case. That will hopefully reveal what knob to twiddle.

To get the confidence of the entities, the sentence is analysed using beam-search, to produce `k`

different parses – each parse being a list of `(start, end, label)`

triples, with each triple describing an entitiy. To figure out the probability of a particular `(start, end, label)`

triple, we normalize the scores, and sum the scores of each parse that contains the entity. So if an entity is in 15/16 parses, and the only parse it’s not in has probability 0.01, we say the entity has probability 0.99. If the entity is in 9 parses that sum to 0.73, we say its probability is 0.73.

An entity that’s in all parses has probability 1.0. So, one possibility is that the model simply isn’t returning much diversity on your data, at all. Here’s how you could check that:

```
docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, [d.tensor for d in docs], beam_width=32, beam_density=0.001)
for doc, beam in zip(docs, beams):
for score, parse in nlp.entity.move.get_beam_parses(beam):
print(score, [(label, doc[start : end]])
for start, end, label in parse])
```

The relevant bits of spaCy we’re using can be found here: https://github.com/explosion/spaCy/blob/develop/spacy/syntax/nn_parser.pyx#L448

And https://github.com/explosion/spaCy/blob/develop/spacy/syntax/ner.pyx#L128

The two parameters to adjust here are the `beam_width`

and the `beam_density`

. The code above says to produce maximum 32 analyses, but to prune the beam so that the lowest-ranking analysis has a score at least 0.1% of the top one.

If the model’s probabilities are looking good, then the problem really is in the sorting dynamics. One parameter you could adjust for that is the `bias`

parameter. A bias of `0.0`

means that the uncertainty is used as the priority to sort the examples. A negative bias will skew towards lower scores, and a positive bias will skew towards higher scores.

Thanks! Should be fixed now.