We are building an intelligent automation solution powered by image detection (and image segmentation) for a company [B2B use case] They presently don't have a lot of digital data. We have manually annotated the data that they already have.
The solution needs to improve over time and we are recommending the client to buy license of prodi.gy and use it as part of their solution. [some review screens pop ups prompting active learning annotation feedback and admin triggered retraining of neural network] Could you provide us any material on use-cases where this has been done before. This will help me to sell this idea to internal team as well as the customer.
Before we go to the client with the solution, we would be worried about what happens going forward.
how much clarity we have with the internal mechanisms.
how to improve later, in case there are performance issue.
what kind of support will be available.
what parts are available as SDK vs open source
Could you direct me to these 2 information if they already exist.
The project tag on the forum also has some examples, blog posts and papers showing things others built with Prodigy: Topics tagged project
There are two aspects here that you want to distinguish between:
the annotation tool your use to script your annotation workflows
the machine learning library you use and the models you train
Prodigy is a tool for point 1., and it lets you run and script annotation workflows, from the data loading and preprocessing, to example selection, all the way to showing examples in the UI. What you do with that data, how you use it, and what results you achieve depends on your application and the model you train.
So Prodigy can give you the building blocks for writing workflows for image annotation with a model in the loop and run experiments. But the modelling part is up to you, and you'll have to experiment with different ways to make your model in the loop sensitive enough to small updates, which method for selecting examples works best, and how to best calculate the progress and whether more data is needed (for NLP tasks, we typically use an estimate of when the loss will hit 0).
But from what you describe in your post, it sounds like that's exactly the type of stuff you've been working on, right?
If by performance you mean your model's performance, that's up to you. If it turns out that your model still makes certain mistakes, maybe you want to set up a new annotation workflow that focuses on those types of examples and asks the user to correct the model's predictions. Or maybe you want to write an error analysis workflow to really pinpoint what the biggest problems are. (I'm showing something similar in my custom recipes video towards the end ~33:10).
See this forum You can browse the tags here: Prodigy Support
There's actually very little "secret magic" going on here and you typically have three components in your recipe:
a function that takes a stream of examples and assigns scores to them using a model
a function that takes a scored stream and decides what to send out for annotation (based on the score or some other metric) – see the built-in sorters for examples
a function that takes batches of annotated examples and updates a model, if needed
All of these are things you can implement yourself, and mix and match in your recipe. Some related resources:
While Prodigy comes with built in training recipes for running quick experiments with spaCy (for NLP), you can also implement your own training. In fact, you have to, if you're working with images and custom models. That's entirely separate and depends on the model you use. You have access to the annotated data from Python via the database API and then use that to train your model however you like, in a separate process, integrated into your custom solution and triggered by any action.
Hope this answered your questions and good luck with your project