AWS announced their supercomputer configuration with Amazon’s James Hamilton posting on the configuration.
The cc1.4xlarge instance specification:
· 23GB of 1333MHz DDR3 Registered ECC
· 64GB/s main memory bandwidth
· 2 x Intel Xeon X5570 (quad-core Nehalem)
· 2 x 845GB 7200RPM HDDs
· 10Gbps Ethernet Network Interface
The AWS supercomputer configuration is 7040 cores. At 4 cores per processor and 2 processors per server you get 880 servers (nodes) in the compute environment.
If you assume assume about 350 watts/server you can get 300KW of power. 20 2U server per rack makes for 44 racks and 7KW per rack. Sounds abut right.
Amazon is one of 4 self-made configurations.
10Ge is rare in many of supercomputer clusters, but AWS chose 10G Ethernet which may explain their self-made configuration.
But AWS was after a specific scenario like Hadoop.
It’s this last point that I’m particularly excited about. The difference between just a bunch of servers in the cloud and a high performance cluster is the network. Bringing 10GigE direct to the host isn’t that common in the cloud but it’s not particularly remarkable. What is more noteworthy is it is a full bisection bandwidth network within the cluster. It is common industry practice tostatistically multiplex network traffic over an expensive network core with far less than full bisection bandwidth. Essentially, a gamble is made that not all servers in the cluster will transmit at full interface speed at the same time. For many workloads this actually is a good bet and one that can be safely made. For HPC workloads and other data intensive applications like Hadoop, it’s a poor assumption and leads to vast wasted compute resources waiting on a poor performing network.