Amazon is a possibility for some research (and there are PIs who haven gone that route). There are a couple of problems:
1) If you use EC2 24/7 and need a ton of data storage and fast data transfer capabilities it's no longer that cheap.
2) Sending potentially sensitive data off to amazon servers isn't a great idea. Even if you have data that is supposed to be de-identified, there are PIs who will intentionally or unintentionally screw up and put sensitive data on your cluster. It's one thing if this is inside an academic lab. It's another thing entirely if it's beamed over the internet to uncontrolled machines.
3) The amount of data being collected these days is mind-bogglingly huge. Even a couple of years ago when I was more directly involved in HPC, data sets for things like genomics data where gigantic. They could collect several TB of data per day and it was rapidly increasing. Transferring all of that off to amazon takes a lot of bandwidth and time. Keeping the cluster closer to the data collectors can be a win.