Before we do so, I want to briefly outline main points related to Amazon cloud. It is actually a pretty broad range of services under the name of
- EC2 – The cloud itself. That is the computing capacity you’re using via virtual images of Linux. They call them Amazon Machine Images (AMI).
- S3 – Cloud Storage system. No data is being stored in EC2 itself between runs. S3 is where you store your data even when EC2 is not running.
- SimpleDB – data base services (actually exposed as web services interface) with basic functions of data retrieval and indexing.
Big reason for using Amazon WS is access to Amazon A9 search index – billions web pages that you don’t need to crawl on your own
How exactly did we use it? We didn’t host any end-user software on Amazon instead we deployed our data acquisition software on it and were utilizing it for data retrieval and transformation. While the biggest part of the solution was hosted elsewhere data acquisition scripts were running on Linux images on EC2 using S3 as a storage mechanism. Afterwards the pre-processed data was being downloaded from Amazon cloud to our data center. That’s it. Despite all hassles and troubles that always exist when you’re trying to create something great, we were definitely achieving the major objective – retrieving pre-processed data. Again, the main point was not the data exclusively but data + processing capabilities.
For more information on Amazon cloud: