With pos.teach, you show a score in the bottom right corner that indicates the confidence in the highlighted tag. Could you let me know how this is computed?
since the docs describe Token.prob as a log probability, but the result is very different from the score in pos.teach.
I’d like to use this score for my own recipes.
doc.prob is the unigram probability of the word, i.e. its frequency. To get the probabilities from spaCy’s tagger, you would do:
doc = nlp.make_doc(eg['text'])
tagger = nlp.get_pipe('tagger')
token_vectors = tagger.model.tok2vec([doc])
scores = tagger.model.softmax(token_vectors)
You can read more details in the Tagger implementation here: https://github.com/explosion/spaCy/blob/master/spacy/pipeline.pyx
Perfect. Just what I needed.
@honnibal, could you please provide a similar code chunk to get the probabilities for the dep labels of the parser?
I’ve been looking through Spacy code but haven’t been able to figure it out.
Getting probabilities out of the parser is much harder, because of the way the model’s objective works. There have been a few discussions about this on the issue tracker: https://github.com/explosion/spaCy/issues?utf8=✓&q=probabilities . I think this thread is the most comprehensive about the issue: https://github.com/explosion/spaCy/issues/881