It’s been not so long time ago when we were actively using Amazon WS for one of our projects. Let’s bring up some takeaways.

Before we do so, I want to briefly outline main points related to Amazon cloud. It is actually a pretty broad range of services under the name of Amazon Web Services, which includes cloud storage, hybrid cloud solutions, and a host of other services. When we started using it back in 2006 it was called Alexa Internet but then Amazon WS became a separate branch and needless to say, it was huge leap ahead compared to Alexa. Alexa is now mostly focusing on Internet traffic statistics. The high-level anatomy of Amazon WS:

  • EC2 – The cloud itself. That is the computing capacity you’re using via virtual images of Linux. They call them Amazon Machine Images (AMI).
  • S3Cloud Storage system. No data is being stored in EC2 itself between runs. S3 is where you store your data even when EC2 is not running.
  • SimpleDB – data base services (actually exposed as web services interface) with basic functions of data retrieval and indexing.
Besides these main infrastructure services you have additional set of items, like payment services, hybrid cloud services, and other cloud apps not usually provided by other cloud computing companies can be found here. Although one of the most interesting advantages is that you can use Amazon web crawl. In other words you may have access to either raw data – billions web pages cached (and frequently updated) or A9 search index. Thus if web data is part of your business plan then it is very convenient to use Amazon WS.

Big reason for using Amazon WS is access to Amazon A9 search index – billions web pages that you don’t need to crawl on your own

How exactly did we use it? We didn’t host any end-user software on Amazon instead we deployed our data acquisition software on it and were utilizing it for data retrieval and transformation. While the biggest part of the solution was hosted elsewhere data acquisition scripts were running on Linux images on EC2 using S3 as a storage mechanism. Afterwards the pre-processed data was being downloaded from Amazon cloud to our data center. That’s it. Despite all hassles and troubles that always exist when you’re trying to create something great, we were definitely achieving the major objective – retrieving pre-processed data. Again, the main point was not the data exclusively but data + processing capabilities.

For more information on Amazon cloud:
  • Seethis SlideShare presentation
  • Official Amazon WS website
Alex Yakima
Paul is a software architect for Luminis Technologies and the author of “Building Modular Cloud Apps With OSGi”. He believes that modularity and the cloud are the two main challenges we have to deal with to bring technology to the next level, and is working on making this possible for mainstream software development. Today he is working on educational software focussed on personalised learning for high school students in the Netherlands. Paul is an active contributor on open source projects such as Amdatu, Apache ACE and Bndtools.