Welcome! This forum is the official place discuss Prodigy, our annotation tool for efficient machine teaching. Feel free to ask questions, report bugs, or share your results and custom recipes. For an overview of discussions around annotation and training strategies for NLP projects and beyond, check out the best practices
tag. For usage examples and installation instructions, see the documentation.
The forum is powered by Discourse and supports both Markdown and BBCode for formatting, as well as image uploads and syntax highlighting. You can log in with your Twitter or GitHub account, or register with your email address (which wonβt be shown publicly).
Getting started
For more details and the API reference, see the PRODIGY_README.html
available for download with Prodigy.
- Installation instructions β How to install Prodigy on your platform, set up the database and change the default settings.
- First steps guide β How to get started with Prodigy, including sample data sets for testing.
- Prodigy Cookbook β Quick references for various common use cases.
- Recipes β Handy overview of all built-in recipes with examples and available arguments.
- Live APIs β List of available, built-in API loaders to stream in real-world content.
- Web application β How to annotate with Prodigy and customise the look and feel of the front-end.
- Text Classification β How to use Prodigy to train and evaluate a new text classification model.
- Entity Recognition β How to use Prodigy to improve the entity recognition of spaCyβs default English model.
- Computer Vision β How to use Prodigy to annotate object detection and image segmentation data, and boostrap an image classifier.
-
Custom Recipes β An intro to custom recipes, including an example of a customer sentiment annotation project using the
choice
interface.
Raw data sets
To start annotating, you need a source of examples. You can either load in your own data, or use one of the sample datasets below.
news_headlines.jsonl (19.5 KB)
200 headlines from stories about Silicon Valley from the The New York Times.
github_issues.jsonl (115.3 KB)
830 GitHub issue titles for search queries related to documentation and instructions.
Annotated data sets
If you want to test Prodigy with already annotated data, you can download one of the datasets weβve created for the text classification and NER workflows, and import them to a new Prodigy dataset.
github_textcat_docs.jsonl (222.9 KB)
reddit_product.jsonl (659.4 KB)
insults_seeds_reddit.jsonl (22.5 KB)
insults_reddit.jsonl (229.1 KB)
prodigy dataset new_dataset "A new dataset"
prodigy db-in new_dataset /path/to/annotations.jsonl
Video tutorials
Changelog
All Support Forum Tags
Weβre looking forward to your feedback!