The Amazon Web Service (AWS) is a large collection of different web based services to allow you to run applications on the Amazon Cloud. A huge range of services are provided including Compute and Networking, Storage, Databases and Web Applications. Some of the most popular services provided are the Simple Storage Service (S3), Elastic Compute Cloud (EC2) and Elastic Map Reduce (EMR). All of these services are flexible and available on demand.
For better performance and better resiliency, a wide range of locations are available: US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (Sao Paulo). For example you may wish to have your services hosted within the same region as the intended users to reduce latency and improve performance. Only a handful of the largest companies can claim to have data centres spread over such a large area.Simple Storage Service
Amazon Simple Storage Service (S3) is the cloud storage service offered by AWS. Some of the main benefits of using this cloud storage are the scalability and high availability and almost unlimited storage. Financial Services are often constrained by storage issues and so may be sceptical of promises of “unlimited” storage. This exemplifies the common disbelief at the sheer scale of some cloud resources that can often lead to mistrust.From a reliability perspective, Amazon specifies that a Service Level Agreement of 99.999999999% for S3, which is often a higher SLA than that offered by using on premise data centres.
There are often fears that data is not secure when using cloud data storage. Coupled with S3 comes the ability to use Server Side Encryption, so that your data can be stored in an encrypted format. This will mean that sensitive information can (in some cases) be stored within the cloud. In our case, this allowed us to use a plain HTTPs connection to submit jobs to Amazon, data access being secured in S3 in the background.Whilst the interface to S3 is simple it is clear that significant design work has been done to make this a viable competitor to on premise data storage. Amazon S3 should certainly be considered as an option, except in cases when specialised storage is required.
Elastic Compute CloudCreating and managing a group of virtual machines can often be a high maintenance task, especially when it comes to very large scale infrastructures. The thought of having to install and configure your operating system on the virtual machine will make most people think twice before trying it. Using Amazon’s Elastic Compute Cloud (EC2) can be an easy solution to create virtual machines.From working with EC2, we found that creating VM’s was not as painful a task as originally thought. There are a number of Amazon Machine Images (AMI’s) that come with pre-configured operating systems, which in certain cases will take care of the configuration that is needed (though, none of the AMIs come configured with Java as standard)
Financial Services can, on occasions, be restrained by the size of their available compute power and their ability to expand this compute. Using cloud compute services, it is possible to increase compute power on demand. There may be some doubt in the ability to create VM’s when you require them – in our experience during 3 months of constant project, we never had an issue to build mid-size clusters on a daily basis. In terms of availability, Amazon specifies a Service Level Agreement of 99.95% for their EC2 service.Elastic Map Reduce
Elastic Map Reduce (EMR) is an AWS service that enables you to easily process Big Data. Processing of Big Data is often thought of as task that requires a lot of compute power as well as a large amount of time to setup that compute power. To an extent this is partially true, but the setup required does not have to be complicated.Creating a large Hadoop cluster would require building a network and configuring each of the nodes to handle specific stages of a Hadoop job. This process could in some situations take a large amount of time. EMR can handle the majority of this configuration for you. We have found that this only takes around 10 minutes to setup when using EMR to create a Hadoop cluster of… unlimited nodes! (provisioning happens in parallel).
This can come with its downsides though, mainly that the size and structure of the network is unknown. Clearly a Hadoop cluster created around a single switch in the network would have higher performance than one that is distributed across the whole data centre. Again from experience and benchmarks Excelian have run, this has not caused noticeable performance issues in our case.
Combining and engineering all these technologies into one platform has allowed us to build a fully functional compute grid environment in the cloud requiring very little maintenance and completely dynamic; in this case, it is a feasible alternative to an on-premise installation.Stay tuned for more details on this EMR implementation, a detailed white paper being in the pipeline around our experience in a real life context with this technology.