I know that this is not related to programming, but I hope that some feedback will help me in my misery.
We actually have a lot of different data from our web applications dated years ago.
For example, we
- Local Apache Files
- Daily statistics files from our tracking software (CSV)
- Other daily statistics for nationwide ad ratings (CSV)
- .. and I can probably also create new data from other sources.
Some data records began in 2005, some in 2006, etc. However, at some point in time, data from them begins to appear.
What am I doing? H ^ H ^ H ^ Search is an application for understanding all the data, it allows me to download them, compare individual data sets and timelines (graphically), compare different data sets for the same period of time, let me filter (especially the log file Apache); and, of course, all this should be interactive.
Only compressed Apache log files, compressed by BZ2, already make up 21 GB, grow weekly.
I have not had real success with things like awstats, Nihu Web Log Analyzer or similar tools. They can simply create static information, but I will need to interactively request information, apply filters, transfer other data, etc.
, , (.. ), . RapidMiner.
, : . , .
- , -, , . .
Update:
, :
- bash PHP ,
- CSV Excel. Excel 2007, , ,
- Amazon EC2 script CSV . 200 , , . , , 45 . , Amazon EC2. , .