First, I want to thank you for the great work on Prodigy and spaCy. I was able to progress much faster on my NER project since I started using Prodigy and spaCy. Now I have a trained NER model with 13 labels that performs fairly decently, overall F-score > 90%, but for one label, sometimes the prediction will only include portion of the ground truth span. However, when I tried running the same sample through ner.teach it's highlighting the correct span. After some digging, I was able to nail down the following situation:
nlp = spacy.load(model_path) # trained NER model
displacy.render(nlp(sample_text), style="ent")
while this shows the incorrect predicted entity (partially highlighted)
from prodigy.models.ner import EntityRecognizer
nlp = spacy.load(model_path) # trained NER model
model = EntityRecognizer(nlp)
displacy.render(nlp(sample_text), style="ent")
by wrapping nlp with EntityRecognizer, the error is gone. I wonder if you could shed some light on this. Thanks!
Hi and thanks for the kind words, that's great to hear
I'm surprised that using the EntityRecognizer annotation model here even worked, because what it does it quite specific to Prodigy and the binary annotation process (we should have probably used a better and more descriptive name for it). The annotation model will use the underlying nlp object and the beam parse to suggest entities based on the different possible analyses of the text. These aren't necessarily the most confident predictions. So it will generate a stream of binary suggestions of all kinds of entities, potentially conflicting, and their respective confidence scores. Within the recipe, those suggestions can then be filtered and sorted, e.g. to focus on the most uncertain predictions and skip suggestions with a higher confidence, so you can focus on annotating examles that result in the most relevant updates to the weights.
That likely also explains what you were seeing in ner.teach: the suggestion here may have been one that the model was less confident about, compared to a different analysis that ended up "winning". For instance, for the sentence "i like apple", the model may come up with 2 different possible analyses: ["O", "O", "O"] (no entities at all) and ["O", "O", "U-ORG"] ("apple" is an ORG). Given the current weights, analysis 1 might end up with a higher score, so that's the output you will see at runtime. However, when you run ner.teach, the second analysis may end up with a score of 0.5 , so Prodigy will present it to you for annotation. If you accept this suggestion, this is very valuable feedback to the model – in fact, more valuable than if you accepted a very high-confidence prediction. If you annotate enough of these examples, this can move the model into the right direction, so it's more likely to predict the correct analysis in the future for similar examples.
Thanks so much for the detailed explanation (and sorry for not seeing it earlier) and I understand that the Prodigy annotation model may produce different output from the original spacy model as in how ner.teach is intended to be used. However, what I am trying to understand is why the same spacy model produces different output after being wrapped by the prodigy EntityRecognizerclass. I rewrote my code snippet hoping to make it a bit more clear
from prodigy.models.ner import EntityRecognizer
# trained NER model
nlp = spacy.load(model_path)
# same trained NER model, but passed to Prodigy EntityRecognizer class constructor
nlp_wrapped = spacy.load(model_path)
EntityRecognizer(nlp_wrapped)
# feed same text to both model objects
doc = nlp(sample_text)
doc_wrapped = nlp_wrapped(sample_text)
And I see differences when comparing attribute values token by token between doc and doc_wrapped. For example (here DIMENSION is a label in my trained NER model, *_1 fields are from doc_wrapped and *_2 fields are from doc (not wrapped))
i
text
ent_iob_1
ent_iob_2
ent_type_1
ent_type_2
dep_1
dep_2
6
3.4
B
B
DIMENSION
DIMENSION
nummod
nummod
7
cm
I
I
DIMENSION
DIMENSION
npadvmod
dobj
8
x
I
O
DIMENSION
punct
punct
9
4.9
I
O
DIMENSION
nummod
nummod
10
cm
I
O
DIMENSION
appos
dep
11
x
I
O
DIMENSION
punct
punct
12
2.2
I
O
DIMENSION
nummod
nummod
13
cm
I
O
DIMENSION
npadvmod
npadvmod
And annotations in the training dataset are all like what doc_wrapped produces (that's the main reason I started this experiment/investigation). I do realize it's probably more of a spaCy than a Prodigy question, but it is related to the use of prodigy.models.ner.EntityRecognizer. I wonder if you could shed more light on what I am seeing. Thanks again!
Oh I understand what you mean now, sorry! And yes, your analysis seem correct: in the current stable version, the EntityRecognizer constructor will add a sentencizer for sentence segmentation by default – but this is definitely unideal and it's actually something we've removed for the upcoming version (currently available as a nightly). It usually doesn't matter that much, because the nlp object in the annotation model is updated and later discarded anyways – but there are cases where the sentence segmentation can impact the model's predictions, because the entity recognizer isn't allowed to predict entities across sentence boundaries.