Exposure and counterparty limit checking: NoSQL indexing and querying solution
Client Problem Our client is a financial institution that requires pre-deal visibility of their credit exposures in respect to the counterparties it deals with. Credit limits are set for each counterparty, and the exposure to this counterparty should not exceed the limit.
The incumbent solution recalculated exposures as part of a nightly batch, after which a check was done to see if limits were exceeded. An obvious flaw with this solution is that limits are only retrospectively checked, so even though warnings may be raised if a limit is approached, given a busy day of trading with counterparty, limits may be seriously breached.
Solution overview The best of breed solution implemented solves the same problem in real time, providing the business with the capability to calculate the client’s current credit exposure to a counterparty right after a set of deals is executed, and in some cases even before the deal is executed.
This solution needed to be fast and simple - fast from the perspective of storing and checking the calculated exposures against the credit limits and simple in terms of implementation and support.
From the computer science perspective, this was the classic problem of finding an element in the set that satisfies a certain condition in a time close to constant, and it has been thoroughly described in appropriate literature over the last 50-60 years. However, before attempting to implement it in-house, a search was conducted for libraries that already implement this approach, and the solution was found.
CQEngine, (Collection Query Engine), is a NoSQL indexing and query engine, for retrieving objects matching SQL-like queries from Java collections with ultra-low latency. It allows the programmers to create indexed collections of Java objects in memory and query them faster than the linear time that a conventional linear search provides. Fast querying is achieved by maintaining indexes on objects’ fields or a combination of objects’ fields. Multiple indexes can be created for a collection and multiple indexes of different types can be created for a particular field, thus allowing queries of different types to be supported for the same field.
Technical Deep Dive - Indexes There are many types of indexes available.
Hash One of the most widely used indexes is the Hash index that handles a field’s equality or inequality to a particular value. For example, if we want to get all the exposures for a particular counterparty, a Hash index is our choice. Internally, it maps a range of values for a field to a set of objects, so that lookup is performed in a constant time.
Navigable Another popular type of index is the Navigable index that supports queries containing “less than”, “greater than” or “between” relations to be answered in logarithmic time. Internally, it uses red-black tree data structures instead of hash tables. For example, a Navigable index can be used to look up the objects whose timestamp field is greater than a certain moment in time. Queries containing a “starts-with” relation can be optimised using radix trees and ones containing “contains-string” can be optimised with suffix trees. CQEngine has an index implementation for each of them.
Standing Query An interesting type of index is a Standing Query index. This index allows arbitrary complex queries to be answered in constant time. For every object added or removed into the collection, the engine updates internal sets of objects possibly matching a query. Standing Query indexes can also improve performance of fragments of queries that makes them extremely useful. If an index for a field cannot be found, the engine falls back to a so-called Fallback index. A Fallback index is an artificial type of index that is essentially a “no-index” – objects are analysed one by one, which is basically a linear search. This is appropriate for small sets but can lead to substantial delays when used with large collections.
Compound Compound indexes allow conjunctions (queries containing AND relations) to also be answered in constant time, which makes this type of index a very attractive option. However, the number of compound indexes grows exponentially as the number of fields that must be queried at the same time grows. What is more, the amount of memory required to maintain a single compound index also grows exponentially as the collection grows.
Solution Challenges: Time vs Space Space requirement of such queries can be dramatically improved by statistical knowledge of an underlying data set. If we know that the number of objects per a particular realisation of certain field is limited, then, instead of maintaining a set of compound indexes, we can build a Hash index for this field only. In this case, CQEngine will combine a hash table search for this field and then iterate through the search results to match the remaining fields. This provides a slightly slower performance than the one from compound index but its time complexity is still constant and also requires much less space.
Solution Implementation and Business result In practise, the Credit Risk Engine is a long-running Linux process that calculates Potential Future Exposures, which are persisted to the NoSQL in-memory database instance we built. Similarly, credit limits are received via a messaging interface and are persisted.
In this way, if a user wishes to check pre-deal exposure limits for an individual counterparty, they are able to use the trader client interface to execute the relevant near real-time query. This increased ability to pre-deal limit check provides the business with the opportunity to manage their risk against a given counterparty.
In summary, the choice of CQEngine allowed us to keep all the calculated exposures in memory while maintaining the advantages of indexed lookup of conventional embedded SQL databases, thus providing the user with the extremely fast access to pre-calculated risk measures. In absolute numbers, for a collection of around 1 million exposures, constant lookup using hash indexes’ performance was of the order of 10 μs, whereas linear lookup would be of the order of 100 ms, thus achieving 10,000 times performance boost. Of course, it comes at a cost of slower insertion performance and higher memory footprint, but the cost is comparably low when compared to the lookup performance improvement.