New features idea

After using your solution for about 4 months, here are some features that would be nice improvements.

  • Doc for the previous versions
  • Train for smaller (no transformer) NER model (yeah, they are nice but too big for some utilization or overkill)
  • Cross validation training for smaller models
  • More stats (i.e. number of elements per tag)
  • Confusion matrix for NER
  • NER (or else) errors visualization
1 Like

Hi! Thanks for these suggestions :slightly_smiling_face: Some quick comments:

I'm not 100% sure what you mean by this? We typically recommend using spaCy's CNN-based pipelines for training with Prodigy, and it's also the default configuration you get out-of-the-box. Transformer-based pipelines are a nice add-on if you want to suqeeze out the final percent of accuracy, but especially during development, what you typically care about most is whether your model is learning. So I I agree that transformer embeddings are often overkill here and spaCy provides good alternatives optimised for CPU.