How can Prodigy help me test my now completed app?

I have a general question from a Prodigy beginner.

I am hoping you can point me to resources where I can learn how to use Prodigy for a specific purpose. I have completed my app after nine months in development.

Written in Python, it accepts any recipe as a block of text, and returns a detailed nutritional profile, with help from Spacy, USDA databases, and Dash/Plotly.

Now, I estimate I need to test the app's algorithm against at least 500 recipes. I need to use it to strengthen the model in any way possible. How can Prodigy make the process easier and thorough?

In particular, how can it help me - again if possible - locate missing ingredients, and add them to my knowledge base? In what ways can I use it to do error analysis for this kind of project?

I know this is a vague, rookie question. I am not expecting explicit instructions, but can you help me think of ways to best use Prodigy during this final phase in the development of my first real app?

Thank you.

Robert Pfaff

Hi @robertpfaff !

A good way to further improve your model is to increase its robustness from different types of recipes. You can use Prodigy to create a new corpus to retrain your model and help it generalize further. In this case, you can do the following:

  • Create a corpus of different recipes you can find. You might want to check different formats and styles. For example, some recipes tend to start out with an exposition before going through the actual process. It helps if your model can differentiate or predict given that.
  • Create a Prodigy recipe for correcting your model's outputs. You can use a custom recipe that takes in your model and displays the result in a UI.
  • Using that Prodigy recipe, correct the model's mistakes
  • Train a new model from the corrected corpus

We also have a good resource on an image captioning workflow that includes an error analysis recipe. It may make better sense to come up with some custom categories that you want to analyze, and annotate those first to get a rough idea of where the main problems are (a la divide and conquer).

1 Like

Its been a while since I posted this question.

I have a follow-up if you have a moment.

The last challenge is to account for weight differences in different ingredients in order to correctly calculate calories. For a quick example, I have the number of calories per grams for a given ingredient, but I do not have the number of calories per an Imperial unit like an ounce or a cup.

Through the USDA, I have access to the raw data to develop a conversion factor that makes it easier to design a Python function to compute total number of calories per ingredient, regardless of the unit of measurement. But its a lot of work without help from an automation tool.

For example, if I am analyzing a recipe that calls for 2 cups of lentils, I know there's 3.52 calories per gram of lentils and 192 grams per cup. That gives me total calories 1352 calories in two cups of lentils. If I need a benchmark, there's 236 grams in cup of water, which means lentils have conversion factor of 80% relative to a cup of water. I could use that to create conversion tables for every ingredient if I can figure out the number of calories per one metric unit and one imperial unit for each.

That's a long way of getting to the point.

(I am still thinking through the details. I apologize).

The point is the USDA provides that raw data needed to compute total calories per ingredient, whether in metric or imperial terms, for the 5,000 ingredients in my database. But it is a mess.

Numbers and text are combined in one column, making calculations impossible. For some ingredients, they use units of measurement like "packets" or "slice" that will be difficult to quantify.

How would I approach cleaning up this mess with Prodigy?

Is it a case for text classification?

If you could respond with a starting point or link to the right Prodigy recipe, or training video to help me wrap my mind around this challenge, that's all I am looking for.

Though I am interested in all sincere feedback.

Thanks in advance.


Hi @robertpfaff !

Just to make sure I understood your problem correctly: you already have the raw data you need (the one USDA provides) so that you don't have to go through the effort of writing a conversion table for each ingredient. However, the database itself is a mess to deal with?

I'm curious as to what the dataset looks like? Is it in digital format? Excel sheet? It seems that there needs to be some preprocessing needed on your end?

Text classification might be an overkill for this. Perhaps you can start things off by writing a number of rules? Like, if it's a digit, it's a number, or a text from a list, etc. You can try that one first. I'm confident that you can go a long way with a rules-based approach because I don't think your input text are clearly defined sentences [?].

Hope it helps!