What machine learning algorithm would be best in this scenario?

UPDATE - short version

After reading some answers and comments, I think I can better generalize my question. See below for more details.

I am looking for a machine learning algorithm that can:

  • generates combinations of several different variables,
  • learn from continuous feedback from people who classifies each combination as “good” or “bad” and improves its future generation accuracy.
  • work relatively well on small datasets of feedback (training)
  • weight has recently provided feedback data (training) - in other words, old feedback data should become less influential over time compared to new feedback.

Scenario example

Lets say that I'm trying to create an algorithm that will generate an exercise schedule. I will give him certain restrictions (I can only spend 45 minutes a day, I can’t train on Thursdays, etc.). Then I want him to create a workout schedule for each day of the week.

Then I want to tell the algorithm that I don’t like part of the created schedule (maybe I don’t like working on the same day as ab workout). The only input that I give to the algorithm is that one of the exercises for the given day is “bad” (I cancel ab or the working section, one).

But I am not saying this , why it is bad, only that it does not work for any reason. This may be one of millions of different reasons, and maybe I noted it “badly” after training, and I don’t even understand why it didn’t go well, I just didn’t feel it.

In addition, the algorithm may suggest that I do not mark any exercise schedules as “bad,” at least “good.”

What am I looking for ...

I'm looking for an algorithm (machine learning, I suppose) that will take this feedback and train over time to try and guess which workouts I like. Most likely, it will work with relatively small data sets (I do not train thousands of times a week), and I can not extract data from other people (therefore, there is no some kind of a la Netflix recommendation mechanism).

I think this falls under the binary classification problem (either the proposed training schedule is “at least normal” or “bad”) , but I'm not sure what the best approach would be from an algorithmic point of view.

I can (hopefully) figure out the details of the coding and the algorithm myself, but I need guidance or recommendations on how to get started!

+6
source share
5 answers

Solution 1:
Netflix Priority addresses a similar issue. The algorithm knows a relatively small number of films that you like / dislike, and based on this [and other users] it offers you which movie you will also like.

Many articles have been written about this subject, in which the BellKor solution got the best result.

This, of course, is not the same problem, as it also affects other user aspects and uses it to match the selected movie for you, but it can help you nonetheless.

(*) In your problem, “film” is a competitive training [abs + aerobic for example], and based on the training that you like / dislike, and their similarity with other users, you can choose your preferred training.

Further information can be found on the Netflix Prize page.


Decision 2
Preliminary previews are simpler, but most likely a less accurate solution may be based on the k nearest neighbor algorithm. Each time the user indicates whether he likes / dislikes the training, you store it.
Your functions are presented here - parts of the training [binary values, for example, if you have a start + abs, these values ​​will be set to true and the rest to false].
When you need to choose a workout, you can match it with your nearest neighbors k (most of the similar functions) and predict whether the user will like or not like this workout based on his answers to similar “neighbors”.
Please note that this algorithm checks whether a workout will be liked or not, and will not create a workout. One of them can be chosen randomly and should be accepted only if the algorithm considers it good, otherwise: select a new randomly and so on ...

General Comment: Follow the discussion of comments:
Machine learning is mainly based on heuristics and experiments. In my opinion, you probably should have several algorithms and check which scores have the best accuracy based on experiments, or combine some algorithms to get the best results. One possibility of combining algorithms based on this answer is:

1. choose a workout based on the BellKor solution to the Netflix prize 2. check k nearest neighbor - if this workout is likely to be approved by the user. 2.1. If it is likely to be approved: pass it to the user and finish 2.2. else: return to 1, and choose a different workout. 
+2
source

This seems like a binary classification problem. I would start reading on the Naive Bayes classifier . The real job is to decide what functions each exercise has. Length? difficulty? did the muscle work? are calories approaching? Times of Day? the weather that day? Since this is what the algorithm learns, the disc makes your decision.

I would recommend Mahout in action (I wrote the part, but not the classifiers), which covers Apache Mahout , and has a pretty decent practical introduction to the classification.

+1
source

You might want to consider exploring multiple instances. The idea is that you send bags of items to be evaluated, and if any item in the bag is bad, the whole bag gets a bad rating. You get a positive rating if all products are good.

This is not a perfect match for your problem, because two good things can be bad in your problem, but it can at least point you in the right direction.

0
source

Here's something vaguely similar in another area: http://en.wikipedia.org/wiki/Evolutionary_music

Personally, I started with a simple scheduler (for example, I built actions into the schedule one at a time in priority order) and added a graphical interface that allows you to move actions on a schedule or mark a group of actions that should be deleted and transferred, perhaps in a different priority order or with which random outrage.

0
source

you cannot create this with just one algorithm. You must build a pipeline that will use different algorithms. And this is not a very simple thing if you want to create a good working application.

0
source

Source: https://habr.com/ru/post/897970/


All Articles