I have the following use case:
- we want to automatically extract some fields from OCRized invoices
- we plan to use a combination of NER and ruled based matching
- we want to have such a model quickly in production, with the ability for human to perform the task manually when the model was not able to extract the given fields
- the idea is that the model would automatically be fine tuned thanks to these annotations and deployed
I know that Prodigy is great when used by data scientists to iterate on a model during the "research" phase of a project. What about using it in such an automated workflow in production? Do you anticipate any issue?
I was thinking about using it along with Airflow for the scheduling part, and MLflow for the serving and tracking features. Do you have a better stack to advise or any resources which would help me?
Thanks a lot in advance!
I think the automated feedback loop is actually pretty tricky in practice, and you should aim to get the system delivered with a periodic refresh process first. You can then look at automation to save the periodic labor associated with the refresh.
If you think about the working system, how many days of correction will you need to have in order to make a meaningful update? You definitely won't need to have up-to-the-minute responsiveness; if you want features like "don't have the same mistake twice" to avoid frustrating the user, that's definitely better to implement with a rules layer. If you update a model with one correction, you can't guarantee that you flip the prediction, as the learning rate will be small. This is not a great experience for a user providing feedback. Imagine if you tell the model that "Snap" is an organization, and in two minutes time you get an example with the same mistake. From the user's perspective, this doesn't feel like the system is learning.
Live feedback, where the model is in the loop, is useful for bootstrapping the annotation dataset. You get the benefit of focussing on examples the model doesn't know the answer to, or from using the model to filter examples. But this doesn't align well with the needs of an actual user of the system -- it's a workflow for the data scientist.
The good news is that this does make your project requirements much simpler. All you need to do is train the initial system in an offline way, and have some way of collecting the user's corrections. You then need an offline process that can convert those into updates to the training data, and the you can train a new version of the model, again offline. Once the offline tasks become routine, you can easily script them to carry them out automatically.