Case study: Building a globally distributed NoSQL Rates store

In 2015, Excelian was engaged to replace a critical end-of-day (EOD) and intraday rates marking system at a prominent Australian investment bank. Migration to a big data solution was not a foregone conclusion so complex considerations were involved when specifying the best technologies.

The need for speed and scale

Almost 20 years old, the existing marking system was written in Sybase, C++, Python and Unix – technologies typically used by investment banks in the late 1990s and early 2000s. Until the 2008/9 global financial crisis, the solution could handle the prevailing low trade volumes. However, as business became more flow driven, the framework struggled to scale to the globalised, high-volume trading model that has since evolved.

Complex set-up made it hard for the business to directly maintain configuration rates, so it had to rely on additional technology to make changes, which pushed up time and costs. When combined with the additional expense of using Sybase to replicate the volumes of real-time data required worldwide, the total cost of ownership became prohibitive. A scarcity of C++ skills and its limitations for rapid development further compounded the problems. Faced with these challenges, the bank wanted to completely overhaul the system using technologies that could deliver the rapid development speed and scalability to meet its business needs.

Key requirements

The new marking system had to meet a range of key criteria. It needed to be:

  1. User-friendly – create a flexible user interface so business users could easily define and persist rules about the source of raw data that constitutes a mark (i.e. whether it is data snapped from market data vendor systems or retrieved from a file / database saved to by traders) and perform automatic calculations on saved marks to produce new marks.
  2. Fast – persist official marks to a lightweight storage layer able to handle high volumes with frequent changes to data already saved, then save to recognized ‘sources of truth’ for easy retrieval by traders or auditors. It was accepted that retrieval from the ultimate official system – i.e. the Sybase relational database (RDBMS) or even tape – would take significantly longer.
  3. Auditable – enable auditors to play back the trail to see exactly where any given mark came from. This required a global data distribution capability to replicate large volumes of data via the lightweight storage layer in near real-time.
  4. Easy to inspect – ability to load ‘past-date’ rate data from the source of truth via the lightweight storage layer for inspection by users.

Assessment and architecture principles

The decision to employ a NoSQL solution seemed obvious given the need for a data lake/playground where large quantities of unstructured data can be quickly housed for temporary storage, accessed, modified and replicated globally before finally being written to the RDBMS where it would not be accessed frequently.

Having considered the options, we decided to employ a NoSQL solution and chose Apache Cassandra for a variety of reasons.

  • The appeal of a cost-effective open source (fix-or-amend-it yourself) solution.
  • NoSQL solutions typically have a flat database structure with no joins or indices, perfectly suited for the high volumes of data and performance level required to support the critical nature of EOD marking.
  • With a ring partitioning of the keys and data distributed equally among the cluster, the Cassandra model provides high resilience and maximum throughput. Although an existing cluster of high-end hardware was used – which doesn’t exactly fit Cassandra’s commodity hardware mold – it was fit for purpose.
  • Data filtering and manipulation was executed at Java application level, using a minimalist Cassandra query to retrieve or update the data.
  • Creating a service layer above Cassandra to persist/retrieve all data guarantees data quality by ensuring there is no direct data access or modification by users – which had been a regular feature of the old set-up.
  • The availability of high-quality monitoring tools, such as DataStax OpsCenter, also increased confidence in the technology choice. As part of the DataStax Enterprise offering, OpsCenter has proven extremely powerful for rapidly understanding the state of the cluster and performing administrative tasks.

NoSQL challenges

Every system has its pros and cons, and our experience with Cassandra on this project was no exception. By highlighting some of the key issues to anticipate, we hope that our insights will prove useful for anyone looking to switch to a NoSQL solution. Given that most developers have an RDBMS background, a significant shift in thinking is required from the application programming perspective.

  • Data integrity – it was vital to ensure consistency of data underpinning the cluster in different regions. Most modern NoSQL solutions deliver eventual consistency focused on data availability and partition tolerance. While this is how write operation performance is achieved, regional consistency was equally important. Cassandra extends the concept of eventual consistency by offering tunable consistency. We allowed a write operation to succeed when 50%+1 nodes in a local data center reported a successful completion of the operation, with other remote data centers set to be eventually consistent.
  • Query performance – Cassandra’s CQL query language is very similar to SQL. However, by comparison, CQL proved somewhat restrictive. So while Cassandra solves the problem of potentially time-consuming queries by forbidding anything at CQL level that can’t be executed in constant time, it transfers responsibility for query performance to the application code. Although this makes any query performance problems explicit, it requires significantly more effort from a developer.
  • Latency – without indices, queries couldn’t be optimized to exploit the known structure of housed data. Secondary indices have worked well in classic relational database systems for years but only when underlying data was confined to a local server. In a distributed environment, secondary indices are not recommended as they may introduce latency. Cassandra indexes should only be used when data has low cardinality.
  • Backwards compatibility – API changes between major Cassandra versions have broken builds and required some re-engineering of our internal code base. This could be both a development challenge and risk. However, Cassandra recently adopted a ‘tick-tock’ release model, which may improve things. Furthermore, adopting strong test focused development principles, such as TDD, could have quickly resolved feature changes in the Cassandra API.

Benefits of switching to NoSQL

  1. Global speed – while large volumes being moved worldwide still put tremendous pressure on the network, the speed of global data distribution has largely resolved the Sybase global replication issue.
  2. Open source – the benefits of Cassandra’s open source status are significant and enable easy pinpointing of bugs or potential enhancements, which has driven the bank to seek further open source solutions.
  3. Real-time data – Cassandra’s underlying data model of fast, efficient hash map storage has delivered the near real-time properties that users wanted.
  4. Ease of use – the ease of administering the system, especially for the initial set-up and addition of new nodes to expand the cluster has greatly enhanced the user experience.

Sybase versus Cassandra

In order to run a comparison of data store performance before (Sybase model) and after (Cassandra model), a test environment was set up with two databases containing the same data, as illustrated in Figure 1:

1.png
Figure 1. Sybase versus Cassandra test environment.

The data sets were loaded, different volumes of records from both databases were tested and times measured. In Figure 2, a thousand records correspond to roughly 1Mb of data in the database and on disk. In all cases the results showed that Cassandra significantly out-performed Sybase by a factor of three times or more.

2.png
Figure 2. Timings (# of records vs time).

Match your tech choice to migration needs

Migration from a relational database to a big data store is a major undertaking and it’s not always obvious that NoSQL will offer the best approach. Too often it can seem as if migration to a big data solution can readily resolve all existing RDBMS woes. However, it’s not a foregone conclusion and sometimes some fine-tuning of performance, or de-normalisation are all that is required.

For this project, Cassandra was the best fit for the task, however, we highly recommend exploring all the big data technology options to make the best match for your next migration specification. Here are a few suggestions.

  • Seriously consider the underlying data types you need to store, for example, column store versus document. Consider too the consistency and partition (CAP) theorem and de facto solutions. For instance, placing more weight on CAP tolerance might point to the suitability of MongoDB or Redis, whereas availability and partition tolerance would suggest Cassandra or CouchDB.
  • Bear in mind the benefits of a NoSQL solution that supports a rich query language. It will lessen the burden on developers with an RDBMS background and accelerate time to delivery.
  • Finally, for mission-critical systems in an enterprise setting, you could consider a DataStax Enterprise solution. Its many advanced features can greatly improve production deployment, including the enhanced OpsCenter, caching and Spark master resilience, which are essential when running at scale. When combined with increased testing and integration with other key technologies, these powerful tools can greatly enhance a vanilla Cassandra deployment.

This case study is a part of the #3 issue of Excelian's Tech Spark magazine, this time focusing on big data.

Related content

How did Mache come into existence? Financial Services, in particular Investment Banks, have always had a heavy reliance on large scale compute grids with a typical compute grid ranging from 2,000 up to 50,000 compute cores. Traditionally these applic...