Did somebody come up with a convenient recipe command to view all the examples where the NER model predicts wrong. I'd like to see if there are some systematic wrong behavior - could imagine a lot of people wanted to that as well at some point.
I know I can just write a small recipe that prints to the console in some convenient way but maybe somebody already made that (or similar)?
@explosion: do you recommend using prodigy for this kind of task - no feedback needed, just evaluation? Or do you recommend using some other tool like streamlit or similar (in case the print-to-terminal does not suffice)?
Just to make sure I understand the question correctly, you mean on the evaluation data, right? This should be pretty straightforward to implement because you'd just need to run your trained model over the examples and then compare the predicted doc.ents to the annotated spans (even as basic as comparing the (start, end, label) of the entity spans). Or, if you wanted it to be fancier, you could also check if it's a false positive/negative or if just the label is wrong.
Using Prodigy could work here, especially if you want to click through the examples – you could even use blocks and a text input and then leave some notes for yourself on the particularly interesting examples. Or add multiple choice options to categorize the types of mistakes (kinda similar to the evaluation recipe I built for the image captioning tutorial). If you have the start/end/label data, rendering the examples in Prodigy will be super easy and it's probably one of the quickest ways to get something onto the screen.