Using spancat for ner confidence scores

efratmentel · August 1, 2022, 11:26am

Hello!

I am working on training a NER, and am interested in getting confidence scores for the predicted entities.
From what I understand, this is only possible by training a spancat component instead of a NER component.

Is this true?
Can I use my training data I annotated using ner recipes in prodigy in order to train a spancat? Do I need to change it in any way?

Also, I read that the default spancat scorer uses the LinearLogistic layer. Since I do not have overlapping entities, how do I change this to softmax?

Thank you!
Efrat

ryanwesslen · August 1, 2022, 3:22pm

hi @efratmentel!

Thanks for your questions and welcome to the Prodigy community!

This is correct. See this spaCy discussions post:

You can use NER components as well. Take a look at this documentation on how to create a custom recipe that use predicted scores to modify annotation order.

Actually, you should be fine. Check out this helpful past post that describes.

Yep - that makes sense. For questions like this, the SpaCy discussion forum can be helpful. For example, I found a similar post that suggested the same. As referenced, you could use the softmax from thinc in a custom config. If you have questions, I would suggest replying to that post as the spaCy developer team can help you (this forum is more for Prodigy-specific questions).

Hope this helps and let me know if you have any further questions!

efratmentel · August 2, 2022, 8:38am

Thank you!!

efratmentel · August 4, 2022, 9:30am

Hello again,

About the "confidence scores" - I am trying to get the predicted scores of the predicted entities so that I will be able to use a custom threshold on the positive predictions of my model (with my trained ner component).

Following your link https://prodi.gy/docs/named-entity-recognition#active-learning-custom, I still can't seem to understand how to do this. This is the code I am using:

import spacy
import copy
from prodigy.components.loaders import JSONL

nlp = spacy.load('path-to-my-model')

def predict(stream):
    for eg in stream:
        predictions = nlp(eg["text"])
        for score, start, end, label in predictions:
            example = copy.deepcopy(eg)
            example["spans"] = [{"start": start, "end": end, "label": label}]
            print(f'example:{example}, score:{score}')


predict(JSONL('path-to-jsonl-with-test-sentences'))

Which returns the following error:

Traceback (most recent call last):
  File "path-to-venv-folder/venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-284cc292b8b8>", line 1, in <cell line: 1>
    for score, start, end, label in doc:
TypeError: cannot unpack non-iterable spacy.tokens.token.Token object

What am I doing wrong? Any help would be greatly appreciated. Thank you!

ljvmiranda921 · August 4, 2022, 9:56am

Hi @efratmentel ,

Whenever you iterate on a Doc object (i.e., the output of nlp(eg["text"])), you're iterating on set of Token objects. What you may want to do is something like:

spans_key = "sc"  # usually the case unless you set something different
for eg in stream:
    prediction = nlp(eg["text"])
    for span in predictions.spans[spans_key]:  
        # do something

From the span variable, you can access all Span attributes. However, If you want to access the scores, you might want to do something like:

spans_key = "sc"  # usually the case unless you set something different
scores = prediction.spans[spans_key].attrs["scores"]

Without the need for iterating through each Span.

efratmentel · August 4, 2022, 12:13pm

Thank you, but unfortunately this still isn't working for me.

I can't seem to find "scores" in the span attributes.
predictions.spans returns an empty dictionary. The recognised entities can be found in prediction.ents, but not the scores.

What am I missing here?
After loading my trained model (custom ner) I am trying to use it on new sentences, and get a list of the entities it recognises in each sentence, with their predicted scores - the softmax activation output of each entity. Is this possible?

ljvmiranda921 · August 10, 2022, 12:58am

Hi @efratmentel ,

predictions.spans returns an empty dictionary. The recognised entities can be found in prediction.ents, but not the scores.

This partially explains why there are no spans detected in your Doc. To clarify, as of now there's no straightforward way to obtain confidence scores in ner. You can only do so via spancat. This means you have to train your model using spancat, not ner.

efratmentel · August 10, 2022, 7:02am

Okay, thank you very much for your reply!

Topic		Replies	Views
Prediction or probability score for prediction results using ner model developed by ner.teach usage , ner , spacy	4	1382	May 1, 2020
spaCy get confidence on entity classification? ner , spacy	2	3031	March 19, 2019
Displaying a confidence score next to a user-defined entity usage , ner	20	10240	August 15, 2021
Level of confidence in NER usage , ner , solved	3	830	April 18, 2018
NER Trained Model Analysis ner , spacy	9	547	July 30, 2023

Using spancat for ner confidence scores

Related topics