This forum is the official place discuss Prodigy, our new tool for efficient machine teaching and annotation. Feel free to ask questions, report bugs, or share your results and custom recipes.
The forum is powered by Discourse and supports both Markdown and BBCode for formatting, as well as image uploads and syntax highlighting. You can log in with your Twitter or GitHub account, or register with your email address (which won’t be shown publicly).
For more details and the API reference, see the
PRODIGY_README.html available for download with Prodigy.
Installation instructions – How to install Prodigy on your platform, set up the database and change the default settings.
First steps guide – How to get started with Prodigy, including sample data sets for testing.
Prodigy Cookbook – Quick references for various common use cases.
Recipes – Handy overview of all built-in recipes with examples and available arguments.
Live APIs – List of available, built-in API loaders to stream in real-world content.
Web application – How to annotate with Prodigy and customise the look and feel of the front-end.
Text Classification – How to use Prodigy to train and evaluate a new text classification model.
Entity Recognition – How to use Prodigy to improve the entity recognition of spaCy’s default English model.
Computer Vision – How to use Prodigy to annotate object detection and image segmentation data, and boostrap an image classifier.
Custom Recipes – An intro to custom recipes, including an example of a customer sentiment annotation project using the
Raw data sets
To start annotating, you need a source of examples. You can either load in your own data, or use one of the sample datasets below.
news_headlines.jsonl (19.5 KB)
200 headlines from stories about Silicon Valley from the The New York Times.
github_issues.jsonl (115.3 KB)
830 GitHub issue titles for search queries related to documentation and instructions.
Annotated data sets
If you want to test Prodigy with already annotated data, you can download one of the datasets we’ve created for the text classification and NER workflows, and import them to a new Prodigy dataset.
github_textcat_docs.jsonl (222.9 KB)
reddit_product.jsonl (659.4 KB)
insults_seeds_reddit.jsonl (22.5 KB)
insults_reddit.jsonl (229.1 KB)
prodigy dataset new_dataset "A new dataset"
prodigy db-in new_dataset /path/to/annotations.jsonl
We’re looking forward to your feedback!