The ner.batch-train recipe returns an accuracy score, but no F1 score. Additionally, while I can retrieve “right” and “wrong” counts from EntityRecognizer.evaluate(), I don’t see the specific error types that would let me calculate either F1 scores or confusion matrices between categories. Is there anywhere that this information is accessible, or if not that, the predictions themselves? Thanks!
Unfortunately we don’t report those statistics currently — it’s a good idea, which we’ll add to our list for an upcoming release. In the meantime, you should be able to calculate the accuracies yourself. The best solution would be to group the examples by entity-type, and then make repeated calls to that
evaluate() method. This should let you calculate per-entity-type accuracies pretty easily, albeit a bit inefficiently (it’ll re-parse the documents N times for N entity types).
Thanks for the response @honnibal. Glad to know I wasn’t missing something obvious.