After using your solution for about 4 months, here are some features that would be nice improvements.
- Doc for the previous versions
- Train for smaller (no transformer) NER model (yeah, they are nice but too big for some utilization or overkill)
- Cross validation training for smaller models
- More stats (i.e. number of elements per tag)
- Confusion matrix for NER
- NER (or else) errors visualization
Hi! Thanks for these suggestions Some quick comments:
I'm not 100% sure what you mean by this? We typically recommend using spaCy's CNN-based pipelines for training with Prodigy, and it's also the default configuration you get out-of-the-box. Transformer-based pipelines are a nice add-on if you want to suqeeze out the final percent of accuracy, but especially during development, what you typically care about most is whether your model is learning. So I I agree that transformer embeddings are often overkill here and spaCy provides good alternatives optimised for CPU.