Use cases demo + clarifications for Business

We are building an intelligent automation solution powered by image detection (and image segmentation) for a company [B2B use case] They presently don't have a lot of digital data. We have manually annotated the data that they already have.

The solution needs to improve over time and we are recommending the client to buy license of and use it as part of their solution. [some review screens pop ups prompting active learning annotation feedback and admin triggered retraining of neural network] Could you provide us any material on use-cases where this has been done before. This will help me to sell this idea to internal team as well as the customer.

Before we go to the client with the solution, we would be worried about what happens going forward.

  • how much clarity we have with the internal mechanisms.
    • how to improve later, in case there are performance issue.
  • what kind of support will be available.
    • what parts are available as SDK vs open source

Could you direct me to these 2 information if they already exist.

Also, one more add-on question. Is prodigy modular? where I can replace backend algorithms, say

  • active learning algorithm
  • detection of datapoint for annotation
  • trigger automatic retraining after batch annotation
    with our own custom solution?

Hi! If you haven't seen it yet, you might find this example of using Prodigy with TensorFlow image model in the loop useful: Integrating Tensorflow's Object Detection API with Prodigy

The project tag on the forum also has some examples, blog posts and papers showing things others built with Prodigy: Topics tagged project

There are two aspects here that you want to distinguish between:

  1. the annotation tool your use to script your annotation workflows
  2. the machine learning library you use and the models you train

Prodigy is a tool for point 1., and it lets you run and script annotation workflows, from the data loading and preprocessing, to example selection, all the way to showing examples in the UI. What you do with that data, how you use it, and what results you achieve depends on your application and the model you train.

So Prodigy can give you the building blocks for writing workflows for image annotation with a model in the loop and run experiments. But the modelling part is up to you, and you'll have to experiment with different ways to make your model in the loop sensitive enough to small updates, which method for selecting examples works best, and how to best calculate the progress and whether more data is needed (for NLP tasks, we typically use an estimate of when the loss will hit 0).

But from what you describe in your post, it sounds like that's exactly the type of stuff you've been working on, right?

We publish extensive API docs, describing the built-in recipes, Python components, web app and more.

Prodigy ships with some components that are compiled Cython, but we include the source of the database and server, as well as all recipe scripts that are shipped with Prodigy. There's even an open-source repo with various recipe scripts and examples: GitHub - explosion/prodigy-recipes: 🍳 Recipes for the Prodigy, our fully scriptable annotation tool.

If by performance you mean your model's performance, that's up to you. If it turns out that your model still makes certain mistakes, maybe you want to set up a new annotation workflow that focuses on those types of examples and asks the user to correct the model's predictions. Or maybe you want to write an error analysis workflow to really pinpoint what the biggest problems are. (I'm showing something similar in my custom recipes video towards the end ~33:10).

See this forum :smiley: You can browse the tags here: Prodigy Support

There's actually very little "secret magic" going on here and you typically have three components in your recipe:

  1. a function that takes a stream of examples and assigns scores to them using a model
  2. a function that takes a scored stream and decides what to send out for annotation (based on the score or some other metric) – see the built-in sorters for examples
  3. a function that takes batches of annotated examples and updates a model, if needed

All of these are things you can implement yourself, and mix and match in your recipe. Some related resources:

While Prodigy comes with built in training recipes for running quick experiments with spaCy (for NLP), you can also implement your own training. In fact, you have to, if you're working with images and custom models. That's entirely separate and depends on the model you use. You have access to the annotated data from Python via the database API and then use that to train your model however you like, in a separate process, integrated into your custom solution and triggered by any action.

Hope this answered your questions and good luck with your project :smiley: :raised_hands:

1 Like