Convert Estimation to Probability

Question

Convert Estimation to Probability

People visit my site and I have an algorithm that gives a rating from 1 to 0. The higher the rating, the more likely it is that this person will buy something, but the rating is not a probability, and this may not be a linear dependence on probability of purchase.

I have a lot of data on what ratings I gave people in the past and whether these people really make a purchase.

Using this data about what happened to the estimates in the past, I want to be able to take the estimate and translate it into an appropriate probability based on this past data.

Any ideas?

edit: Several people offer bucketing, and I should have mentioned that I was considering this approach, but I'm sure there should be a way to do this "seamlessly." Some time ago I asked a question about another, but perhaps related issue here , I have a feeling that something like this might be applicable, but I'm not sure.

edit2: Let's say I told you that out of 100 customers with an account above 0.5, 12 of them were bought, and out of 25 customers with an account below 0.5, 2 of them were bought. What can I conclude, if at all, about the estimated probability of buying from someone with a score of 0.5?

+4

math algorithm probability

sanity Mar 24 '11 at 23:16

source share

7 answers

Draw a diagram - draw the relationship of buyers to non-buyers on the Y axis and the score on the X axis - set the curve - then for this point you can get the probability of the height of the curve.

(you do not need to physically create the diagram, but the algorithm should be obvious from the exercise)

Simples.

+4

symcbean Mar 24 '11 at 23:23

source share

That's what logistic regression is , regression is broken , and the company was invented. Currently, most people will use logistic regression, but iterative algorithms are involved - of course, there are many implementations, but you may not want to write them yourself. Regression Probit has an approximate explicit solution described in the link, which may be good enough for your purposes.

A possible way to evaluate whether a logistic regression will work for your data is to look at the graph of each point compared to the purchase probability log (log (p / (1-p)) and see if they form a straight line.

+1

Aniko Apr 6 '11 at 3:44

source share

Well, an easy way to do this is to calculate what percentage of people in the range of points gained something and do it for all intervals (say, every 0.05 points).

Have you noticed the actual correlation between a higher score and an increased probability of purchases in your data?

I am not a specialist in statistics, but perhaps the best answer.

0

Argote Mar 24 '11 at 23:22

source share

You can split points into several buckets, for example. 0.0-0.1, 0.1-0.2, ... and count the number of customers who bought and did not buy something for each bucket.

Alternatively, you may need to throw each point into the amount spent (like a scatter chart) and see if there are any obvious relationships.

0

Mick Mar 24 '11 at 23:24

source share

You can use exponential decay to get a weighted average.

Take your users, arrange them in the order of points (randomly exchange contacts).

Working from left to right, start with the average value of 0. Each user you receive changes the average value to average = (1-p) * average + p * (sale ? 1 : 0) . Do the same from right to left, except for starting at 1.

The smaller you make p , the smoother your curve will become. Play with your data until you get a p value that will give you the results you like.

By the way, this is a key idea of how average loads are calculated by Unix systems.

0

btilly Mar 24 '11 at 23:42

source share

Based on your comment on edit2, you will not have enough data to make an expression. Your total purchase rate is 11.2%. This is not statistically different from your 2 purchase rates, which are higher / lower .5 In addition, in order to confirm your rating, you would need to ensure that the percentage of purchases increases monotonically as your account grows. You can bucket, but you will need to check your results against the probability calculator to make sure that they did not happen by accident.

http://stattrek.com/Tables/Binomial.aspx

0

Ralph winters Mar 25 '11 at 2:15

source share

sanity · Accepted Answer · 2011-11-27T16:33:07+0000

In the end, I found exactly what I was looking for, an algorithm called "adjacent pairs". At first I found it in this article , however, it should be warned that there is a flaw in the description of the implementation.

I describe the algorithm, this flaw, and the solution for it on my blog .

Convert Estimation to Probability

More articles: