I have a question.
Is it possible to show true positives, true negatives, false positives, false negatives, recall and the precision of a trained model somewhere within Prodigy?
If not, is there a way you could calculate this yourself?
The prodigy.models.ner.EntryRecognizer.evaluate() method will tell you the accuracy of the model, but doesnât currently return P/R/F scores. The method supports the use-case where the gold-standard has only entities known to be correct, without necessarily containing all of the correct entities â i.e., the use-case where the gold-standard has missing values. You should specify the flag no_missing=True if you donât have missing values in your gold-standard.
Hereâs some code to return P/R/F, assuming you have no missing values in your gold standard:
tp = 0.0
fp = 0.0
fn = 0.0
for eg in test_examples:
doc = nlp(eg["text"])
guesses = set((ent.start_char, ent.eng_char, ent.label_) for ent in doc.ents)
truths = set((span["start"], span["end"], span["label"]) for span in eg["spans"])
tp += len(guesses.intersection(truths))
fn += len(truths - guesses)
fp += len(guesses - truths)
precision = tp / (tp+fp+1e-10)
recall = tp / (tp+fn+1e-10)
fscore = (2 * precision * recall) / (precision + recall + 1e-10)
Hi @honnibal
I recently started using Prodigy, annotated data and trained the model, Now I am trying to check for accuracy of the model and later calculate precision, recall and F-score for the same. I have a few questions.
Should I just pass my model to prodigy.models.ner.EntryRecognizer.evaluate(/path/to/my/model)? (It should be EntityRecognizer? Typo? ).I tried the same, but got the following error: TypeError: evaluate() takes exactly 2 positional arguments (1 given)
I am trying to understand what a 'Gold-standard' model is? I have annotated the data and trained the model in Spacy, but what exactly is a Gold-standard. I am missing something obvious here.
@dsnlp Yes, thatâs EntityRecognizer, so definitely a typo. This class is Prodigyâs built-in annotation model â so basically, the wrapper that takes care of scoring the examples, updating a model with (incomplete) annotations and so on.
You can find more details on the API, how to initialize the model and what arguments the methods take in your PRODIGY_README.html. The EntityRecognizer is initialized with a loaded nlp object, and you can then call the evaluate method on a list of examples.
@ines Thanks. Unfortunately, I do not have access to the documentation as the installation was handled by someone else. Is there a min working example that you can help providing reference to?
@dsnlp Ah, that sucks â you should definitely get that README, since it includes all the detailed API docs. Any way you can contact the person who received the Prodigy installer? Otherwise, if you have the order ID (starting with #EX), email us and we can re-send it.
@dsnlp apologies if this isnât answering the right question, but: In machine learning parlance, the term âgold standardâ really just means the reference annotations â the âcorrect answerâ youâre trying to predict. There are unfortunately lots of these little terms-of-art in machine learning and NLP. I think one of the best practical discussions of evaluation in ML is in this short primer by Andrew Ng: https://www.mlyearning.org/