Cloud computing’s Achilles Heel
I have touched upon this issue before however here are some illustrations of what I think is cloud computing's Achilles heel. It has to do with shared hardware and virtualization. In my case I have a Drupal site running in a Xen guest running on top of a Xen host. For whatever reason while being indexed by a Google bot Apache went "crazy" allocating tons and tons of memory and swapping like crazy. At this point the Xen guest is nearly unusable since the load is close to a 100.

Now let's look at what is happening to the underlying Xen host ie. one that runs the Xen guest

Yikes. If you had another instance on this particular Xen host you can bet that instance would be severly affected. The trouble is that you may not be really aware of it since you do not have access to the underlying hardware. You may be scratching your head why all of the sudden you are getting subpar performance. Also if you are a cloud provider how do you deal with situations like this ? Do you simply shut down machines that exceed certain performance thresholds ? What if this happens to be a production database server which is doing a database dump and should be "allowed" to thrash the disk ? What if you shut it down and you corrupt customers' database ? It gets real tricky real quick.
Also forget about oversubscription. You need one poorly behaving guest to ruin it for everyone else. Although more you oversubscribe more the risk of performance degradation.
September 1st, 2009 at 13:07
We could test that out in one of the EC2 instances