From the third article by Pravev from Eric Baldeshwiler at HortonWorks of September 2011:
We are asked a lot of questions about how to choose the Apache Hadoop node hardware. During my time at Yahoo !, we bought many nodes with 6 * 2TB SATA disks, 24-gigabyte RAM and 8 cores in a dual-slot configuration. This turned out to be a pretty good configuration. This year, I saw systems with SATA 12 * 2 TB drives, 48 GB of RAM and 8 cores in a dual-slot configuration. This year we will see the transition to 3 TB.
What configuration makes sense for any given organization is determined by such factors as the ratio of storage volume to computational load and other factors that cannot be answered in a general way. In addition, the hardware industry is moving fast. In this article, I will try to describe the principles that over the past six years have typically defined Hadoop hardware configuration options. All these thoughts are aimed at creating medium and large Apache Hadoop clusters. Scott Carey made a good case for small machines for small clusters the other day on the Apache mailing list.
source share