Use cases demo + clarifications for Business

Hi! If you haven't seen it yet, you might find this example of using Prodigy with TensorFlow image model in the loop useful: Integrating Tensorflow's Object Detection API with Prodigy

The project tag on the forum also has some examples, blog posts and papers showing things others built with Prodigy: Topics tagged project

There are two aspects here that you want to distinguish between:

  1. the annotation tool your use to script your annotation workflows
  2. the machine learning library you use and the models you train

Prodigy is a tool for point 1., and it lets you run and script annotation workflows, from the data loading and preprocessing, to example selection, all the way to showing examples in the UI. What you do with that data, how you use it, and what results you achieve depends on your application and the model you train.

So Prodigy can give you the building blocks for writing workflows for image annotation with a model in the loop and run experiments. But the modelling part is up to you, and you'll have to experiment with different ways to make your model in the loop sensitive enough to small updates, which method for selecting examples works best, and how to best calculate the progress and whether more data is needed (for NLP tasks, we typically use an estimate of when the loss will hit 0).

But from what you describe in your post, it sounds like that's exactly the type of stuff you've been working on, right?

We publish extensive API docs, describing the built-in recipes, Python components, web app and more.

Prodigy ships with some components that are compiled Cython, but we include the source of the database and server, as well as all recipe scripts that are shipped with Prodigy. There's even an open-source repo with various recipe scripts and examples: GitHub - explosion/prodigy-recipes: ๐Ÿณ Recipes for the Prodigy, our fully scriptable annotation tool.

If by performance you mean your model's performance, that's up to you. If it turns out that your model still makes certain mistakes, maybe you want to set up a new annotation workflow that focuses on those types of examples and asks the user to correct the model's predictions. Or maybe you want to write an error analysis workflow to really pinpoint what the biggest problems are. (I'm showing something similar in my custom recipes video towards the end ~33:10).

See this forum :smiley: You can browse the tags here: Prodigy Support

There's actually very little "secret magic" going on here and you typically have three components in your recipe:

  1. a function that takes a stream of examples and assigns scores to them using a model
  2. a function that takes a scored stream and decides what to send out for annotation (based on the score or some other metric) โ€“ see the built-in sorters for examples
  3. a function that takes batches of annotated examples and updates a model, if needed

All of these are things you can implement yourself, and mix and match in your recipe. Some related resources:

While Prodigy comes with built in training recipes for running quick experiments with spaCy (for NLP), you can also implement your own training. In fact, you have to, if you're working with images and custom models. That's entirely separate and depends on the model you use. You have access to the annotated data from Python via the database API and then use that to train your model however you like, in a separate process, integrated into your custom solution and triggered by any action.

Hope this answered your questions and good luck with your project :smiley: :raised_hands:

1 Like