Prodigy Roadmap

enhancement
meta

(Ines Montani) #1

This roadmap should give you an overview of what’s next for Prodigy and other ideas worth exploring. Feel free to ask questions, or submit requests and suggestions in the comments.

:clipboard: See here for the changelog.


Recipes, Components and Interfaces

  • :white_check_mark: v1.5.0 manual image annotation: image segments (square and polygon shapes)
  • :white_check_mark: v1.5.0 allow plugging in recipes, databases and loaders via Python entry points
  • :white_check_mark: v1.5.0 validate stream using JSON schemas before starting the server and while Prodigy is running, and output detailed messages if tasks don’t have the expected format
  • add built-in solutions for pseudo-rehearsal and data augmentation
  • add more features to manual image annotation interface: editing shapes, undo/redo, fully tested touch screen support and more

Models

  • add support for sense2vec models
  • :soon: built-in wrappers for scikit-learn, Pytorch and TensorFlow / Keras: those will be available via spaCy’s machine learning library Thinc. You can already test the experimental PyTorch integration in the latest release.

Prodigy Annotation Manager

Status: :eight_spoked_asterisk:️ active development
Public beta: September 2018 (earlier for existing Prodigy users)

  • set up large annotation projects with multiple annotators, perform quality control and check concordance
  • add-on product and library that integrates with Prodigy, orchestrates recipes and tasks, and manages your cluster
  • admin console for settings, statistics and annotator management
  • full data privacy, support for internal and external networks

Database and Corpus Management

Status: :eight_pointed_black_star:️ planning

  • separate open-source library for managing and reconciling annotation layers
  • integrates with seamlessly with Prodigy – but can also be used standalone!
  • out-of-the-box support for training spaCy models and adapters for other libraries
  • possible add-on: web UI for viewing data and annotations

Documentation

  • :white_check_mark: v1.4.0Prodigy Cookbook” with quick solutions for various problems
  • produce more end-to-end video tutorials like our text classification with Prodigy video
    • Improving spaCy’s NER model on your data
    • Manual NER annotation
    • Image segmentation and object detection
    • A/B evaluation
    • Data curation and manual annotation (e.g. image selection and preference)

Able to customize UI to give info to user?
Feature Request: Machine Translation View
running prodigy on internal network with multiple annotators
Converting SpaCy training json file to Prodigy jsonl format
Saving and retrieving annotations
#4

When could we expect the release of the built-in wrappers for Pytorch and TensorFlow/Keras?


(Ines Montani) #5

In a recent release of Thinc, we quietly shipped the first version of a PyTorch wrapper so we can start testing it :slightly_smiling_face: The wrappers will be open-source so they can evolve and be updated quickly. It also means that Prodigy won’t have to ship with a bunch of super specific code that may have to change often, and it allows you to reuse them across applications, without having to depend on Prodigy.


(Samuel Pouyt) #6

For the “Prodigy Annotation Manager” do you need help? Beta testing, developing some parts? I am asking because we are launching an annotation project for our medical data. I have tested prodigy by adding two entity, it all worked out. I can generate my models etc. Therefore I am ready to move to the next step :wink:

Sam


(Motoki Wu) #7

I’ll be happy to early test the annotation manager as well.

We have a multiple Prodigy set-up using pm2 but it’s very clunky :slight_smile:


(Ines Montani) #8

@idealley @plusepsilon Thanks a lot! We’re not quite yet at a stage where it’s ready to be tested by others – but there’ll definitely be an alpha/beta program exclusively for existing users :blush: