It is not easy to answer, because the restrictions are quite specialized.
The following considerations may be made:
- BIGQuery is not open yet. Thus, with a small usage base, even if you are in a preview environment, it will be more difficult to get improvement tips.
- Each of your answers asked a question about the modeling system and storage system. Apache Mahout is not a storage engine, so it will not necessarily work on its own. I used to believe that its implementation of machine learning was a fake of several Google Summer of Code, but I updated this opinion at the suggestion of the commentator. It still seems that it has a rather uneven and spotty coverage of various algorithms, and it is not particularly clear how the components are supported or supported. I urge the evangelist Mahut to address this issue.
As a result, this eliminates the 1st, 2nd, and 4th options.
What I do not quite understand is the need for the real-time server to use Hadoop and RHIPE. This needs to be done in your batch processing to develop recommendation models, and not in real time. I suppose you could use RHIPE as a simple, universal interface to run queries.
I would recommend using RApache instead of RHIPE, because you can preload your packages and models. I don’t see the benefits of using Hadoop in the front, but it would be a very natural rear system to fit the model.
(Update 1) Other interface options include RServe (http://www.rforge.net/Rserve/) and possibly RStudio in server mode. There are R / PHP interfaces (see comments below), but I suspect it would be better to access R via HTTP or TCP / IP.
(Update 2). Turning to the whole process, the main idea that I see is that you can request data from PHP and go to R or, if you want to request from R, look at the link in the comments (before OmegaHat tools) or ask a new question About R and SimpleDB - I'm sure someone else on SO will be able to better understand a particular connection. RApache will allow you to create an instance of many R processes already prepared with loaded packages and data in RAM; thus, you will only need to transfer any data that you need to use for forecasting. If your new data is a small vector, then RApache should be accurate, and this seems to be correct for real-time data.
source share