UPDATE - short version
After reading some answers and comments, I think I can better generalize my question. See below for more details.
I am looking for a machine learning algorithm that can:
- generates combinations of several different variables,
- learn from continuous feedback from people who classifies each combination as “good” or “bad” and improves its future generation accuracy.
- work relatively well on small datasets of feedback (training)
- weight has recently provided feedback data (training) - in other words, old feedback data should become less influential over time compared to new feedback.
Scenario example
Lets say that I'm trying to create an algorithm that will generate an exercise schedule. I will give him certain restrictions (I can only spend 45 minutes a day, I can’t train on Thursdays, etc.). Then I want him to create a workout schedule for each day of the week.
Then I want to tell the algorithm that I don’t like part of the created schedule (maybe I don’t like working on the same day as ab workout). The only input that I give to the algorithm is that one of the exercises for the given day is “bad” (I cancel ab or the working section, one).
But I am not saying this , why it is bad, only that it does not work for any reason. This may be one of millions of different reasons, and maybe I noted it “badly” after training, and I don’t even understand why it didn’t go well, I just didn’t feel it.
In addition, the algorithm may suggest that I do not mark any exercise schedules as “bad,” at least “good.”
What am I looking for ...
I'm looking for an algorithm (machine learning, I suppose) that will take this feedback and train over time to try and guess which workouts I like. Most likely, it will work with relatively small data sets (I do not train thousands of times a week), and I can not extract data from other people (therefore, there is no some kind of a la Netflix recommendation mechanism).
I think this falls under the binary classification problem (either the proposed training schedule is “at least normal” or “bad”) , but I'm not sure what the best approach would be from an algorithmic point of view.
I can (hopefully) figure out the details of the coding and the algorithm myself, but I need guidance or recommendations on how to get started!