Prodigy v free

Hello all. This is my first post to this forum. I am seriously considering buying Prodigy, but, as I'm sure you know, there are a number of free and open source tools out that do similar things. I'm wondering if someone here can help justify this purchase over these other options? Thanks.

I cannot comment on any other tools, because I've always been a Prodigy user and have never really given other tools a proper spin. That said, here are some personal reasons why I've always appreciated Prodigy.

  1. Prodigy has great UI. The annotation interfaces come with text that's clear. It provides keyboard shortcuts out of the box and also supports many non-English language too. The UI is opinionated, but it's always felt right.
  2. Prodigy is programmatic. I'm totally free to customise the annotation exprience with any machine learning trick that can be written in Python. That also means that when I wrote doubtlab (a tool to find bad labels) it's super easy to get it working in Prodigy. Same with bulk labelling, explained here:
  1. Prodigy plays nice with the spaCy stack. Once you've annotated your data you merely need to run prodigy train to train a performant pipeline that can do both text classification and entity detection in one go.
  2. Prodigy is flexible. It's pretty easy to re-use components to get a labelling interface that's just right for your use-case. You can re-use all the existing text/audio/image interfaces or just create your own via html and you can still re-use all the built-in sanity checks that Prodigy provides. I've used it for plenty of non-text use-cases that leverage scikit-learn and it remains a simple workflow. I don't know how many other tools properly support this, but it feels rather unique. Here's an example that I made for data deduplication, just to give a example.
  1. Prodigy comes with a good support forum ( you know, this one :wink: ) where people who work on Prodigy can answer questions for you.

You're talking to a Prodigy user who became a Prodigy developer later on. So feel free to take my opinion with a grain of salt, but I've found Prodigy to be such a productivity booster as a data science consultant back when I first used it that I can genuinely recommend it to folks. It really helps to have an annotation tool in your toolbelt just to quickly bootstrap a dataset for ML or to confirm the data quality of pre-existing datasets.

1 Like

Thanks for such a thorough answer. I'm sure you'd have gotten lots of points if this discussion was in Stack Overflow..
If there still is a Stack Overflow...

1 Like