Recommendations using R with SimpleDB or BigQuery or using PHP with SimpleDB

I am currently working on a system that generates product recommendations, for example, on Amazon : "People who bought this also bought this."

Current scenario:

  • Extract the customer’s Google Analytics data and paste it into the database.

  • On the customer’s website, when the product page is loaded, an API call is made to receive recommendations for the product being viewed.

  • When the API receives the product identifier as a request, it searches the database and extracts (using association rules) the recommended product identifiers and sends them as a response.

  • A list of these product identifiers will be processed to obtain product information (image, price ...) on the client side and display on the website.

  • I am currently using PHP and MYSQL with the gapi package and REST storage api on AMAZON EC2.

My question is: Now, if I need to choose one of the following options, which will be the best choice for implementing the above concept.

  • PHP with SimpleDB or BIGQuery.

  • R-language with BIGQuery.

  • RHIPE- (R and hadoop) with SimpleDB.

  • Apache Mahout.

Help Plese!

+6
source share
2 answers

It is not easy to answer, because the restrictions are quite specialized.

The following considerations may be made:

  • BIGQuery is not open yet. Thus, with a small usage base, even if you are in a preview environment, it will be more difficult to get improvement tips.
  • Each of your answers asked a question about the modeling system and storage system. Apache Mahout is not a storage engine, so it will not necessarily work on its own. I used to believe that its implementation of machine learning was a fake of several Google Summer of Code, but I updated this opinion at the suggestion of the commentator. It still seems that it has a rather uneven and spotty coverage of various algorithms, and it is not particularly clear how the components are supported or supported. I urge the evangelist Mahut to address this issue.

As a result, this eliminates the 1st, 2nd, and 4th options.

What I do not quite understand is the need for the real-time server to use Hadoop and RHIPE. This needs to be done in your batch processing to develop recommendation models, and not in real time. I suppose you could use RHIPE as a simple, universal interface to run queries.

I would recommend using RApache instead of RHIPE, because you can preload your packages and models. I don’t see the benefits of using Hadoop in the front, but it would be a very natural rear system to fit the model.

(Update 1) Other interface options include RServe (http://www.rforge.net/Rserve/) and possibly RStudio in server mode. There are R / PHP interfaces (see comments below), but I suspect it would be better to access R via HTTP or TCP / IP.

(Update 2). Turning to the whole process, the main idea that I see is that you can request data from PHP and go to R or, if you want to request from R, look at the link in the comments (before OmegaHat tools) or ask a new question About R and SimpleDB - I'm sure someone else on SO will be able to better understand a particular connection. RApache will allow you to create an instance of many R processes already prepared with loaded packages and data in RAM; thus, you will only need to transfer any data that you need to use for forecasting. If your new data is a small vector, then RApache should be accurate, and this seems to be correct for real-time data.

+2
source

If you want to use the real-time API for recommendations based on data in the database, Apache Mahout does this directly. You want to use ReloadFromJDBCDataModel , put a GenericItemBasedRecommender on top and use the servlet-based wrapper in the examples module. It probably takes a day or two to familiarize yourself with the code and customize it for your needs, but it's pretty simple.

When you go through about 100 M of data, you will need to look at the distribution of the Hadoop calculation. This is a little trickier. Mahout has a distributed advisor that you can configure.

+1
source

Source: https://habr.com/ru/post/895415/


All Articles