💫 Ideas for Prodigy plugins 💫

india-kerle · February 23, 2024, 10:55am

Hi all!

We're crowdsourcing ideas for open-source Prodigy plugins! As a reminder, plugins are recipes that are separated out into their own packages because they require a 3rd party library. We've built plugins like:

Prodigy PDF: Recipes that allow you to label PDFs
Prodigy HF: Recipes that allow you to interact with the Huggingface stack
Prodigy Whisper: Recipes that leverage OpenAI's Whisper model for audio transcription

...and many more.

What labelling use cases do you have that would benefit from a Prodigy integration with a third-party Python library? What would be your dream Prodigy plugin?

strickvl · March 4, 2024, 11:40am

Have been meaning to write a Prodigy integration for ZenML for a while. Would be a nice addition to our supported annotators. But that’s the other way round. Not sure if that’s what you were asking

dedupedude · April 1, 2024, 9:10pm

Hi,

I joined recently and am quite new to Prodigy. Great tool.

I would love to see native support for information retrieval, entity resolution, and similar tasks where we annotate pairs of records rather than classify single records.

Here is a rather simple example for entity resolution: Beyond basic recipes with Prodigy by Explosion AI | by Kabir Khan | Medium

An interesting integration beyond just concatenating pairs into simple texts would be candidate selection. For example, using GitHub - facebookresearch/faiss: A library for efficient similarity search and clustering of dense vectors. for clustering somewhat similar records and then drawing pairs from those.

Best,
Paul

india-kerle · April 2, 2024, 10:44am

@dedupedude,

Thanks for the suggestion and great blogpost!

We have some similar-ish plug-ins like Prodigy-ann and Prodigy-lunr that allow you to query your examples to find the most relevant subset for annotation but it's doesn't fully satisfy the use case you're describing. I've added an issue on this for the team to discuss.

dedupedude · April 4, 2024, 11:36am

The two plug-ins are indeed an interesting starting point. Thanks for sharing.

Re prodigy-ann: you should consider switching to the GitHub - facebookresearch/faiss: A library for efficient similarity search and clustering of dense vectors. library built by facebook research, which also covers HNSW. Not sure how far you get with hnswlib but faiss covers:

many more indexing techniques than just HNSW, including good old KNN using different metrics (which should be the preference when number of documents is small, e.g., <10k)
comes with GPU support (CUDA on linux only),
probably the most established package in this domain (27.7k github stars as of this writing)

Topic		Replies	Views
✨ Prodigy v1.14.3 is out! ✨ news	4	404	October 17, 2023
Prodigy v free	2	450	August 11, 2023
Prodigy 1.12.0 is out! :tada: news	14	1034	August 15, 2023
Prodigy 1.12.0rc2 release candidate available for download! news	5	662	July 5, 2023
Prodigy 1.12 alpha release: LLM-assisted workflows, prompt engineering & fully custom task routing for multi-annotator scenarios. news	10	2542	June 28, 2023

💫 Ideas for Prodigy plugins 💫

Related topics