Risk systems play a pivotal role in global finance, especially since the financial crisis of 2007/8 and the resulting failure of some high profile global institutions. Because of this, one particular area of risk, counterparty credit risk (CCR), has become especially important since the crash, both to financial institutions and regulators. The greater importance that is being placed upon CCR has meant that many institutions are upgrading old risk systems, or implementing new ones specifically for this purpose.

[img]../../../images/img/windows-011.png[/img] The computational requirements for processing CCR can be high, especially once the longer time frames, and hence increased simulation dates, are taken into account when compared to market risk. The amount of data produced is also often large and therefore effective means to store and analyse this data are needed. This requirement for intensive computation and data handling means that comprehensive risk systems can rarely run on a single machine, and often require a grid or high performance computing (HPC) solution in order to process the data and produce results in a timely manner. Excelian has recently been involved in two projects where Microsoft HPC Server 2008 R2 has been the grid middleware chosen to underpin the CCR systems being implemented. CCR is also the basis for credit value adjustment (CVA) and our clients often choose to build upon their CCR systems in order to enable a CVA desk to operate.



The Projects

During the first of the two projects, Excelian worked with the banks’ solution architects to design the proposed grid configuration for both the test and production environments. During this time, existing designs were improved to make more effective use of the hardware available, in turn increasing the grid utilisation and bringing down costs. Excelian built the environments as the hardware was delivered, and took over management of HPC Server during the development process and additionally on-boarding a 3rd party application.

[img]images/img/project-training.jpg[/img]

Excelian was responsible for some of the integration of the 3rd party application with HPC Server and in the role of grid administrators assisted with diagnosing application and grid faults and creating a knowledge base. Excelian worked with the client in order to define the roles and responsibilities, as per industry best practices, for creating a grid support team. Once the boundaries had been defined, Excelian was responsible for creating supporting documentation for the HPC Server environments and training incoming support staff.

As the production environment was built and the system neared go-live, Excelian worked with the support team to transition management of the grid from Excelian to the in house team. Excelian stayed on for an additional month as 2nd line support after the successful go-live.

The second project set out to create a similar system built upon the same technologies at a different institution. The remit was broader on this project with Excelian also managing the 3rd party application and HPC Server as a single entity. Additionally management of the daily batch process and credit feeds was taken on by Excelian, until internal support staff had been trained ready to take control.

The 3rd party application integrated with HPC Server also provides the ability to create a real time CCR system, as opposed to just an overnight batch. Excelian worked on a POC for implementing this real time pre-deal check part of the system, validating that it was possible to be used in the way required by the client, and ensuring that some highly customised functionality specific to that client was possible.

Excelian has been tasked with transitioning the support of the HPC Server grid to the internal support teams, and training has been provided to team members spanning four continents. This will ensure that there is a breadth of knowledge at the bank and sufficiently trained staff to provide 24/7 support to this mission critical system.

Approaches and Lessons Learnt

[img]images/img/projects-directions.jpg[/img]The similarities between the two projects are striking, yet the way in which the technologies are being used by each institution is very different. The main similarity being that with both clients, Excelian built and managed the grid. The major difference involved the layer on top of this, where applications are integrated with HPC Server. Client B wanted the grid and the 3rd party application to be presented to them as a singular whole, while Client A wanted to keep the 3rd party application and grid distinct. This was largely because

Client A had, from the very beginning, other applications in the pipeline which could be integrated with the HPC Server grid.

This desire to build a solution that would be capable of growing into an enterprise grid was a large driver for Client A, whereas Client B wanted a highly optimised and seamlessly integrated solution for a single application and purpose. The differences in design and configuration required for these two different purposes can be very different.

Ultimately, ensuring that the HPC Server environment was ready and able to support multiple applications is a more demanding task, as there are a number of potential difficulties involved in the effective sharing and management of resources for multiple applications. HPC Server has a variety of options available for sharing resources between applications, but these can have different effects on different applications and there is not necessarily a one-size fits all approach; requirements for each institution and application must be considered.

Effective monitoring of a grid is also hugely important, especially when sharing resources between applications. Microsoft provide a GUI interface with basic monitoring with HPC Server, however this is not always adequate and other monitoring tools should be employed to fully understand the underlying health of the grid and to ensure that capacity planning can be managed effectively. Another common element to both projects was the use of Microsoft Systems Centre Operations Manager (SCOM) in order to keep a closer eye upon the HPC Server environment. Microsoft provide a pre-built management pack for monitoring HPC Server, allowing for the grid to be plugged into an existing SCOM instance easily with HPC specific rules already configured.

[img]images/img/windows-azure-and-server.png[/img]

Overall, both projects have successfully integrated HPC Server with the primary 3rd party application, ensuring that their CCR systems can be brought online. With the imminent release of a new version of HPC Server, bringing it in line with Windows Server 2012, improvements in functionality, especially built in SOA monitoring, will be a welcome addition to any ready to upgrade.
Mark Perkins