Algorithms used for programmed recipe classification

Question

Algorithms used for programmed recipe classification

I am interested in classifying recipes programmatically based on a statistical analysis of the various properties of a recipe. In other words, I want to classify the recipe as Breakfast , Lunch , Dinner or Dessert without user input.

The following properties are available:

Recipe name (e.g. chicken salad)
Description of the recipe (free text describing the recipe)
Cooking Method (steps related to cooking this recipe)
Preparation and preparation time
Each ingredient in the recipe and its quantity

The good news is that I have a set of approximately 10,000 recipes that are already classified, and I can use this data to teach my algorithm. My idea is to look for patterns, for example, if the word syrup appears statistically more often in breakfast recipes, or any recipe that requires more than 1 cup of sugar is 90% likely to be a dessert. I believe that if I analyze the recipe in several dimensions, and then adjust the scales as necessary, I can get something that is decently accurate.

What will be good algorithms for research when approaching this problem? Would something like k-NN help, or are there any improvements suitable for this task?

+4

algorithm classification document-classification data-mining

Mike christensen Feb 13 '12 at 18:02

source share

3 answers

If I did, I would try to do it, as Licao suggested. I will focus on the ingredients first. I would install a vocabulary dictionary of the words in the sections of the recipe ingredients and clear the list in a controlled way to remove non-integrated terms such as quantities and units.

Then I would resort to Bayes theorem: your database allows you to calculate the probability of eggs in breakfast and dinner ...; you precompute these a priori probabilities. Then, given an unknown recipe containing both eggs and marmalade, you can calculate the likelihood that the food will be breakfast, a posterior.

You can enrich it later with other terms and / or consider the number (number of eggs per person) ...

+2

Yves daoust Feb 13 '12 at 10:37

source share

I think NN is probably redundant for this. I would try to classify using a single network of perceptrons for each type of food (breakfast, dinner), and let it go through the input and adjust the weight vector. each substantive word found in a data set can be a network entry. I would expect this to be enough for your needs. I have successfully used this method to classify text before.

+1

Weaselfox Feb 14 '12 at 7:55

source share

LiKao · Accepted Answer · 2012-02-13T18:48:06+0000

Try various well-known machine learning algorithms. I would suggest using a Bayesian classifier first, as it is easy to implement and often works quite well. If this does not work, try something more complex, for example. Neural networks or SVM.

The main problem will be to select a set of functions as input to your method. To do this, you should see what information is unique. For example, if you have a recipe called “Chicken Salad,” the “chicken” portion will not be of much interest, because it is also present in the ingredients and easier to collect from there. Therefore, you should try to find a set of keywords that provide new information (i.e. part of the salad). Try to find a good set of keywords for this. It can probably be automated, but most likely you will be better off doing it manually, as it only needs to be done once.

The same goes for the description. Finding the right set of functions is always the most difficult task for such a task.

Once you have a set of functions, just prepare their algorithm and see how good it is. If you don’t have much experience with machine learning, look at various methods to correctly test the ML algorithm (for example, “Leave N”, etc.).

Algorithms used for programmed recipe classification

More articles: