Amazon Web Services Supercomputer configuration, 880 Servers

AWS announced their supercomputer configuration with Amazon’s James Hamilton posting on the configuration.

The cc1.4xlarge instance specification:

· 23GB of 1333MHz DDR3 Registered ECC

· 64GB/s main memory bandwidth

· 2 x Intel Xeon X5570 (quad-core Nehalem)

· 2 x 845GB 7200RPM HDDs

· 10Gbps Ethernet Network Interface

The AWS supercomputer configuration is 7040 cores.  At 4 cores per processor and 2 processors per server you get 880 servers (nodes) in the compute environment. 

If you assume assume about 350 watts/server you can get 300KW of power.  20 2U server per rack makes for 44 racks and 7KW per rack.  Sounds abut right.

Amazon is one of 4 self-made configurations.

image

10Ge is rare in many of supercomputer clusters, but AWS chose 10G Ethernet which may explain their self-made configuration.

image

But AWS was after a specific scenario like Hadoop.

It’s this last point that I’m particularly excited about. The difference between just a bunch of servers in the cloud and a high performance cluster is the network. Bringing 10GigE direct to the host isn’t that common in the cloud but it’s not particularly remarkable. What is more noteworthy is it is a full bisection bandwidth network within the cluster. It is common industry practice tostatistically multiplex network traffic over an expensive network core with far less than full bisection bandwidth. Essentially, a gamble is made that not all servers in the cluster will transmit at full interface speed at the same time. For many workloads this actually is a good bet and one that can be safely made. For HPC workloads and other data intensive applications like Hadoop, it’s a poor assumption and leads to vast wasted compute resources waiting on a poor performing network.