Welcome to the Prodigy support forum!


(Matthew Honnibal) #1

Welcome! :wave: This forum is the official place discuss Prodigy, our annotation tool for efficient machine teaching. Feel free to ask questions, report bugs, or share your results and custom recipes. For an overview of discussions around annotation and training strategies for NLP projects and beyond, check out the best practices tag. For usage examples and installation instructions, see the documentation.

(Ines Montani) #13

(Ines Montani) #14

The forum is powered by Discourse and supports both Markdown and BBCode for formatting, as well as image uploads and syntax highlighting. You can log in with your Twitter or GitHub account, or register with your email address (which won’t be shown publicly).

Getting started

For more details and the API reference, see the PRODIGY_README.html available for download with Prodigy.

  • Installation instructions – How to install Prodigy on your platform, set up the database and change the default settings.
  • First steps guide – How to get started with Prodigy, including sample data sets for testing.
  • Prodigy Cookbook – Quick references for various common use cases.
  • Recipes – Handy overview of all built-in recipes with examples and available arguments.
  • Live APIs – List of available, built-in API loaders to stream in real-world content.
  • Web application – How to annotate with Prodigy and customise the look and feel of the front-end.
  • Text Classification – How to use Prodigy to train and evaluate a new text classification model.
  • Entity Recognition – How to use Prodigy to improve the entity recognition of spaCy’s default English model.
  • Computer Vision – How to use Prodigy to annotate object detection and image segmentation data, and boostrap an image classifier.
  • Custom Recipes – An intro to custom recipes, including an example of a customer sentiment annotation project using the choice interface.

Raw data sets

To start annotating, you need a source of examples. You can either load in your own data, or use one of the sample datasets below.

news_headlines.jsonl (19.5 KB)
200 headlines from stories about Silicon Valley from the The New York Times.

github_issues.jsonl (115.3 KB)
830 GitHub issue titles for search queries related to documentation and instructions.

Annotated data sets

If you want to test Prodigy with already annotated data, you can download one of the datasets we’ve created for the text classification and NER workflows, and import them to a new Prodigy dataset.

github_textcat_docs.jsonl (222.9 KB)
reddit_product.jsonl (659.4 KB)
insults_seeds_reddit.jsonl (22.5 KB)
insults_reddit.jsonl (229.1 KB)

prodigy dataset new_dataset "A new dataset"
prodigy db-in new_dataset /path/to/annotations.jsonl

Video tutorials

:clipboard: Changelog

:label: All Support Forum Tags

We’re looking forward to your feedback!

(Ines Montani) #15

(Ines Montani) #16

(Ines Montani) #17

(Ines Montani) #18