Hello all. This is my first post to this forum. I am seriously considering buying Prodigy, but, as I'm sure you know, there are a number of free and open source tools out that do similar things. I'm wondering if someone here can help justify this purchase over these other options? Thanks.
I cannot comment on any other tools, because I've always been a Prodigy user and have never really given other tools a proper spin. That said, here are some personal reasons why I've always appreciated Prodigy.
- Prodigy has great UI. The annotation interfaces come with text that's clear. It provides keyboard shortcuts out of the box and also supports many non-English language too. The UI is opinionated, but it's always felt right.
- Prodigy is programmatic. I'm totally free to customise the annotation exprience with any machine learning trick that can be written in Python. That also means that when I wrote doubtlab (a tool to find bad labels) it's super easy to get it working in Prodigy. Same with bulk labelling, explained here:
- Prodigy plays nice with the spaCy stack. Once you've annotated your data you merely need to run
prodigy train
to train a performant pipeline that can do both text classification and entity detection in one go. - Prodigy is flexible. It's pretty easy to re-use components to get a labelling interface that's just right for your use-case. You can re-use all the existing text/audio/image interfaces or just create your own via html and you can still re-use all the built-in sanity checks that Prodigy provides. I've used it for plenty of non-text use-cases that leverage scikit-learn and it remains a simple workflow. I don't know how many other tools properly support this, but it feels rather unique. Here's an example that I made for data deduplication, just to give a example.
- Prodigy comes with a good support forum ( you know, this one ) where people who work on Prodigy can answer questions for you.
You're talking to a Prodigy user who became a Prodigy developer later on. So feel free to take my opinion with a grain of salt, but I've found Prodigy to be such a productivity booster as a data science consultant back when I first used it that I can genuinely recommend it to folks. It really helps to have an annotation tool in your toolbelt just to quickly bootstrap a dataset for ML or to confirm the data quality of pre-existing datasets.
Thanks for such a thorough answer. I'm sure you'd have gotten lots of points if this discussion was in Stack Overflow..
If there still is a Stack Overflow...