Displaying a confidence score next to a user-defined entity

Hi,

I came accross one of your posts for generating scores for the entities using beam. I had a couple of questions regarding this topic:

It gives me a “sum of the scores of the parses containing it” based score for named entities defined by out-of-the-box spacy model such as “LOC”, “PER” etc.
I build a prodigy model to identity labels such as “LOCATION” which shows a field name in some location.

My question is when I run the beam code, what exactly do 103,345,1109 mean? Is it a sum-of-scores based on parsing or confidence score? Is there a way I can get a confidence score to rank all the “LOCATION” labels.

103 Trym LOCATION
345 Trym LOCATION
1109 Trym LOCATION
0.9997078684409141 []
0.0001524225082572684 [(0, 1, ‘CARDINAL’)]
9.968089809345492e-05 [(0, -1, ‘DATE’)]
3.133614358889278e-05 [(0, -1, ‘CARDINAL’)]
8.692009146304466e-06 [(0, -1, ‘PERSON’)]
0.999980689646118 []
1.7380253187471615e-05 [(0, -1, ‘DATE’)]
9.215215413671059e-07 [(0, -1, ‘PERSON’)]
7.847656971651813e-07 [(0, -1, ‘CARDINAL’)]
1.0250886506604742e-07 [(0, -1, ‘MONEY’)]
9.080580460904744e-08 [(0, -1, ‘GPE’)]
2.876767626301381e-08 [(0, -1, ‘ORG’)]
1.4568741310754125e-09 [(0, -1, ‘NORP’)]
2.742360438007307e-10 [(0, -1, ‘FAC’)]
0.999980689646118 []

Can you tell me how to get a confidence score for user-defined labels?

Thank you

Could you link the comment? I want to be sure which bit of code is being used.

The scores here are probably “logits” – negative log likelihoods. They might be unnormalised, in which case they’re really just scores, not probabilities.

Sure, here’s the link.

Also, Here is the code:

text = content
doc = nlp2(text)
for ent in doc.ents:
    print (ent.start_char, ent.text, ent.label_)
    docs = list(nlp.pipe(list(text), disable=['ner']))
(beams, somethingelse) = nlp.entity.beam_parse(docs, beam_width=16, beam_density=0.0001)

print (content)

for beam in beams:
	for score, ents in nlp.entity.moves.get_beam_parses(beam):
		print (score, ents)

		entity_scores = defaultdict(float)
		for start, end, label in ents:
			# print ("here")
			entity_scores[(start, end, label)] += score
	print ('entity_scores', entity_scores)

The output:
0.9997078684409141 []
0.0001524225082572684 [(0, 1, ‘CARDINAL’)]
9.968089809345492e-05 [(0, -1, ‘DATE’)]
3.133614358889278e-05 [(0, -1, ‘CARDINAL’)]
8.692009146304466e-06 [(0, -1, ‘PERSON’)]
entity_scores defaultdict(<class ‘float’>, {(0, -1, ‘PERSON’): 8.692009146304466e-06})
0.999980689646118 []
1.7380253187471615e-05 [(0, -1, ‘DATE’)]
9.215215413671059e-07 [(0, -1, ‘PERSON’)]
7.847656971651813e-07 [(0, -1, ‘CARDINAL’)]
1.0250886506604742e-07 [(0, -1, ‘MONEY’)]
9.080580460904744e-08 [(0, -1, ‘GPE’)]
2.876767626301381e-08 [(0, -1, ‘ORG’)]
1.4568741310754125e-09 [(0, -1, ‘NORP’)]
2.742360438007307e-10 [(0, -1, ‘FAC’)]
entity_scores defaultdict(<class ‘float’>, {(0, -1, ‘FAC’): 2.742360438007307e-10})

So basically, it shows scores for defined entities but not user entities.
Is there a way to include user-defined entities like “LOCATION”?

Thank you

You can see the code for this method here: https://github.com/explosion/spaCy/blob/master/spacy/syntax/ner.pyx#L122

As you can see, it doesn’t handle entities you’ve added specially. So if you’re not seeing any entities of your types, that suggests it’s not assigning any probability to those entities in the beam. You’ll likely have to train more.

Oh okay. I used ner.teach on the data till I reached annotating 97% samples and trained it using ner.batch-train and it gave an accuracy of 79%.

Is there any way I can quantify the confidence with which the not-so-well-trained model is predicting the labels? Basically an estimated rank/score/confidence to show the entity label predicted by the entity recognizer.

Thank you

Hmm. Actually that code looks weird. Try this:

text = content
doc = nlp.make_doc(text)
(beams, somethingelse) = nlp.entity.beam_parse([doc], beam_width=16, beam_density=0.0001)
for score, ents in nlp.entity.moves.get_beam_parses(beams[0]):
    print (score, ents)
    entity_scores = defaultdict(float)
    for start, end, label in ents:
        # print ("here")
        entity_scores[(start, end, label)] += score
        print ('entity_scores', entity_scores)
1 Like

Hi Matthew,

Thanks you for sharing this piece of code. I had a question about the interpretation of these scores. For example, following is the output for one of the documents:

Code:

text = content
doc = nlp.make_doc(text)
(beams, somethingelse) = nlp.entity.beam_parse([doc], beam_width=16, beam_density=0.0001)
for score, ents in nlp.entity.moves.get_beam_parses(beams[0]):
    print (score, ents)
    entity_scores = defaultdict(float)
    for start, end, label in ents:
        # print ("here")
        entity_scores[(start, end, label)] += score
        print ('entity_scores', entity_scores)

for (start, end, label),value in entity_scores.items():
		if label == 'LOCATION':
			print (start, tokens[start], value)

Output:
[(902, 'HUMBLY', 0.16623281600085096), (999, 'Hinton', 0.16623281600085096), (1627, 'Horndean', 0.16623281600085096), (2067, 'Horndean', 0.16623281600085096), (2712, 'Set', 0.16623281600085096), (3548, 'Horndean', 0.16623281600085096)]

In one of the support board, you had mentioned that “The probability of some entity is then simply the sum of the scores of the parses containing it, normalised by the total score assigned to all parses in the beam.”
My question is why are the confidences scores of every label the same? Are they normalized based on the number of beam widths( 16 in this case)? Is there a way to rank the confidence for each LOCATION before the score is averaged out?

Thank you so much. I’ve been looking at the GitHub pages to understand the concepts in depth. Appreciate all the help!

Hi @Jashmi1,

Sorry I didn’t see this sooner. The reason you’ll see the same score coming up a lot is because there are only a few candidates in the beam. So if two entities occur in the same two parses of the beam, they’ll receive the same score. It’s far from a perfect way to estimate probabilities, but it’s the best solution we have in spaCy at the moment.