Data mining / statistical analysis options for Heroku Rails app?

Question

Data mining / statistical analysis options for Heroku Rails app?

I have a rails application that is hosted on Heroku, for which I want to enable real-time data analysis. Ideally, I would like to find a way to run a generalized regression regression model, which, as I know, is available in both R (http://cran.r-project.org/web/packages/gbm/index.html) and Stata (http : //www.stata-journal.com/article.html? article = st0087). I want to save the resulting gbm tree, and then, in my application, use it to predict new results based on user input.

If this is not possible, I will be open to using other data mining algorithms. The most important thing for me is the ability to integrate it into my Heroku application so that it can work without my local machine.

Parameters that I looked at:

1) Support Heroku offered to sell the R library in ruby stone. I'm relatively new to rubies and rails, this is what would be possible for me. I looked at the instructions for selling libraries in precious stones, but could not find much.

2) Another thread here (http://stackoverflow.com/questions/6495232/statistic-engine-that-work-with-heroku) mentions CloudNumbers, but it is not possible to call the service from the Rails application.

3) In one of his case studies, Heroku mentions FlightCaster, which uses Clojure, Hadoop and EC2 for their machine learning (http://www.infoq.com/articles/flightcaster-clojure-rails). I saw that Heroku supports Clojure, but is there a way to integrate it (or, more specifically, Incanter) into my Rails application?

Please let me know if you have any ideas.

+6

ruby r ruby-on-rails-3 heroku stata

middkidd 25 sept. '11 at 16:06

source share

1 answer

Noah · Answer 1 · 2011-09-26T10:40:56+0000

I will answer this from the point of view of R. As a rule, you will have two problems:

1) Interaction with R, no matter where it works

2) Doing this from Geroku, where there is a special set of problems.

There are several general approaches to the first of them: you can use binding to R ( rsruby , rinruby , etc.), you can lay out on R (for example, from ruby R R -e "RCODEHERE" ), you can access to R as a web service (see the Rook package, and in particular something like https://github.com/jeffreyhorner/rRack/blob/master/Rook/inst/exampleApps/RJSONIO.R ), or you can manually access R using something like rserve .

Because of this, shelling the R is the easiest task if you just perform one operation and are not very concerned about performance. You will need to analyze the output that returns, but this is the fastest way for my work in one operation.

For more significant use, I would suggest using either one of the bindings, or configure R as a web service in another Heroku application and call it via HTTP.

The next task is to get R to work on Heroku - it is not available as part of the standard environment, and it is a read-only file system without root access, so you cannot just do sudo apt-get install .

It’s possible to sell R in REM - someone started doing it at https://github.com/deet-uc/rsruby-heroku , but I personally couldn’t get it working. It is also possible to build R directly on Heroku by installing all the dependencies, etc. is the approach I used at https://github.com/noahhl/rookonheroku (step 1 is all you need if you are not using Rook).

Note that Heroku may not allow you to deploy the second process in the same thread as your Rails application, which does most of the bindings. This can make these bindings difficult, so I tend to get around it in R or expose it to a web service and access it via HTTP.

Data mining / statistical analysis options for Heroku Rails app?

More articles: