...
Case study 1
Brevity is not my strong point, but let me give it a go -
...
Let me know if you want any more details or if you want to chat in person it further or about what Cornell is brokering. I think it is an excellent idea for Cornell to broker a deal with Amazon.
Case study 2
I am running Bayesian models on quantitative genetic data. Typically, when dealing with real data sets one workstation is enough. However, when I am analyzing multiple data sets (for example, currently 100 simulated data sets to be analyzed with a number of different models), I need to monopolize resources of multiple multi-core servers for 12 to 36 hours. In these kinds of cases, I use AWS. I have a disk image with my data stored in AWS (around 30G, at a cost of ~$0.50/month). I request c3.8xLarge or cc2.8xlarge compute-optimized instances on the spot market. I use the Amazon Linux AMI. This keeps the costs down (I wait for the spot price to fall below 30 cents/hour, which happens most of the time), at the expense of getting the instance shut off if the spot price goes above my bid price. I bid at the official price + 1 cent, which seems to be good enough. I have only lost a couple of instances over the last year and a half. When I run my instances, I activate them using the web interface, make a data disk for each instance from my data disk image (EBS volumes cannot be attached to more than one instance), and run a shell script in the instance (the shell script runs in the background). The shell script reads the data and unmounts the data disk (so I delete the data disks soon after the jobs start, again to save on cost). Several simulated data sets are processed at a time. Once a batch is done, the results are exported (via piping tar output to ssh — seems to be the most stable solution) to a "lab" server. This is so that if an instance is lost, not all results have to be re-generated. Data transfer costs are about 1/3 of the compute costs (1/4 of the total). I do not use more than 10 instances at a time (20 is the AWS limit) because it seems like running more puts too much pressure on the "lab" servers. The amount of data transferred is on the order of 20G per instance (some analyses it’s only 2G), in something like 15 installments. Once all the analyses are done, my script shuts down the instance which then terminates. This way I only pay for the exact time I need.