Its been a while since I posted this question.
I have a follow-up if you have a moment.
The last challenge is to account for weight differences in different ingredients in order to correctly calculate calories. For a quick example, I have the number of calories per grams for a given ingredient, but I do not have the number of calories per an Imperial unit like an ounce or a cup.
Through the USDA, I have access to the raw data to develop a conversion factor that makes it easier to design a Python function to compute total number of calories per ingredient, regardless of the unit of measurement. But its a lot of work without help from an automation tool.
For example, if I am analyzing a recipe that calls for 2 cups of lentils, I know there's 3.52 calories per gram of lentils and 192 grams per cup. That gives me total calories 1352 calories in two cups of lentils. If I need a benchmark, there's 236 grams in cup of water, which means lentils have conversion factor of 80% relative to a cup of water. I could use that to create conversion tables for every ingredient if I can figure out the number of calories per one metric unit and one imperial unit for each.
That's a long way of getting to the point.
(I am still thinking through the details. I apologize).
The point is the USDA provides that raw data needed to compute total calories per ingredient, whether in metric or imperial terms, for the 5,000 ingredients in my database. But it is a mess.
Numbers and text are combined in one column, making calculations impossible. For some ingredients, they use units of measurement like "packets" or "slice" that will be difficult to quantify.
How would I approach cleaning up this mess with Prodigy?
Is it a case for text classification?
If you could respond with a starting point or link to the right Prodigy recipe, or training video to help me wrap my mind around this challenge, that's all I am looking for.
Though I am interested in all sincere feedback.
Thanks in advance.
Robert