I have been using Prodigy for a couple of weeks now and finding it extremely useful and intuitive. The ability to train spaCy models directly is also a really nice feature.
I am, however, a little stuck in my process of annotation/validation of models predictions. I follow the following flow:
- Create a few terms, use sense2vec to enrich that list
- Make a few annotations
- Train model
- Iterate with more annotations and further training.
However, I'm finding it quite hard to evaluate the model beyond the top line metrics of training. Usually, if this were a scikit-learn model, I might be able to load the ground truth labels, score using the model, and then explore the records where there is a mismatch. This is usually helpful to understand if the model might be actually picking up additional positive examples I didn't label.
I can't seem to find a straight forward way of doing a similar analysis in either Prodigy or spaCy. So my eyeball eval flow looks like:
- db-export dataset with gold annotations
- load into spaCy trained model
- iterate over exported annotations and load the "text" into spaCy Doc objects, extract the entities.
- extract entities from annotations
- load both entities from annotations and from model predictions to dataframe.
Then I can inspect the results and see where the model is either predicting additional valid examples, which examples is finding hard to match, etc.
However, the above seems rather an involved way of carrying out the process, maybe there's a better way?
Any help is really appreciated!