Just came across this really cool project using Prodigy to label audio snippets from the Syntax.fm podcast to train a scikit-learn model to predict who is speaking. The repo includes the whole pipeline, including Prodigy recipes and config and notebooks
Twitter thread with more details: