High Performance Computing Grid Containers

Server virtualisation provides many IT infrastructure management benefits, since it can run and manage multiple applications and operating systems on the same physical server. In theory, this allows increased utilisation and improved business continuity by moving applications across physical servers easily and providing a faster server provisioning process via ‘images.’

However, additional performance overheads mean High Performance Computing (HPC) and virtualisation do not go hand-in-hand, restricting access to the benefits of virtualisation. Recent advancements have narrowed the gap in terms of CPU performance when compared to bare-metal1, but large scale grid workloads which require high I/O and internode communication must still contend with performance degradation when running virtualised.

The recent surge in interest in Linux containers (LXCs), especially Docker, have brought about a renewed interest2,3,4 into ‘virtualising’ HPC applications. A Docker-based HPC platform allows near bare-metal performance while reaping the benefit of a virtualise application stack including better application isolation and fine tuning of resource allocation within a host.

In this article, we investigate the performance overheads of Docker and VirtualBox, which we use to run compute nodes in the Excelian HPC Lab cluster, and explore the potential benefits of a ‘Dockerised’ HPC compute environment.

Excelian HPC compute cluster

The Excelian HPC compute cluster consists of 5 servers with a total of 56 cores, and its main usage is grid software development and testing. We use VirtualBox to create virtual servers for a number of use cases:
  1. Simulating multiple compute hosts connecting to a grid.
  2. Creating and automating HPC build environments for our clients.
  3. Simulating hosts with different configuration and operating systems.
  4. Application isolation – preventing a running gird process from interfering with another process on the same host.

In our experience, running virtualised clusters meets our development and testing requirements. Yet there are a few drawbacks:
  1. Slow provisioning of virtual host – especially if provisioning a few instances simultaneously on the same host
  2. Overhead can be high in terms of memory usage and CPU usage on certain HPC workloads
  3. Need to maintain a set of post-provision scripts (we use Ansible) for certain grid middleware which does not work well with ‘imaging’

Most recently, we experimented with Docker to evaluate its performance and its suitability as a possible replacement for VirtualBox

Benchmarks

We used the Java Grande Benchmark Suite as a basis for our benchmarks. The 3 main benchmarks are:
  • Euler: Benchmarks the timesteps per second while solving Euler equations for flow in a channel.
  • Monte Carlo: uses the Monte Carlo technique to price products derived from the price of an underlying asset.
  • RayTracer: benchmarks the number of pixels per second rendered on a scene using a 3D raytracer.

Figure 4 and Table 1 show Docker achieving Bare-Metal performance in most of the benchmarks while Virtualbox lags behind, achieving only 25% of bare-metal performance in some cases. Also worth noting is a small performance increase observed on Docker when compared to Bare-Metal. One possible explination could be the effects of some small loads (e.g. slocate, metric collection, etc.) running on the Bare-Metal host during the benchmarks.

image-1.png

image-2.png

image-3.png

Evaluation

Setting up a Docker test environment for our cluster was an easy process with documentation and resources available online. During the benchmarks, we also found containers were created almost instantly, while virtual instances took longer (several minutes) to instantiate. We were also impressed with Docker’s ability to enable fine-grained guarantees of resources and performance isolation for each of our compute containers

With regards to HPC Grid computing, we foresee the following benefits of Docker:
  1. Provision of application insulation on the same host i.e. Docker engorce limited access to resource (CPU, memory, ect.) for each running application. In traditional Grid/HPC, a running task may potentially take up all RAM/CPU on a single host.
  2. Abaility to share host between Grid and non-grid environments e.g. during weekend, application servers may be used to host Grid containers and perform long-running weekend/month-end batch jobs.
  3. Allows applications to be easily packaged and deployed across different infrastructures (e.g. external or internal cloud platforms, developer desktop ect.)
  4. Ability to simultaneously run varied workloads, both grid and non-grid, on the same server, thus increasing server utilisation while guaranteeing resource and performance to each application container.

Summary and further work

We shall continue with our effort to further docker for HPC, exploring how higher-level orchestration tools such as Kubernetes and Mesos may be used to enable HPC and non-HPC sharing of resources, increase server utilisation and simplify the management of containerised applications. We will also look at LXD, a Docker competitor, and how it stacks up in terms of features and performance. Our benchmarks have shown Docker HPC compute containers to have negligible performance overheads. And we are now looking forward to testing vendor-based Grid middleware products, which we will do once they start following IBM’s lead and provide container support.

Key points

  • Although server virtualisation provides many IT infrastructure management benefits, additional performance overheads have limited its success with high performance computing (HPC)
  • Excelian test have shown Docker HPC compute containers to have negligible performance overheads offering many benefits to HPC grid computing.
  • Excelian now intends to evaluate the impact of higher-level orchestration tools such as Kubernetes and Mesos and test LXD, one of Dockers competitors.

Comments

Not to be published