Financial Services, in particular Investment Banks, have always had a heavy reliance on large scale compute grids with a typical compute grid ranging from 2,000 up to 50,000 compute cores. Traditionally these applications rely on In-Memory Data Grids (IMDG) for providing a scale-out data layer however more and more NoSQL products like Cassandra and Mongo are finding their way into the technology stack. IMDG's (Oracle Coherence, Gemfire etc.) provides near caching but they do not allow the storage layer to scale-out; data can only be read from a single node in the cluster. NoSQL platforms like Cassandra are able to provide a `replication factor` which provides a more scalable solution, allowing multiple read requests to be scaled out across replicas.
However, if we start to think about what happens at scale (for instance up to 50k concurrent read requests), the problem then becomes: what happens when the grid reads hundreds of Gigabytes of data across thousands of nodes?
If each node can send 1GB of data per-second, then even a high `replication factor` cannot save the performance from quickly degrading and bringing the grid to its knees. The network simply cannot transfer 1TB of data in a split second. The logical next question is - what is the nature of the data access pattern? Lucky for us, most compute grid applications have a reliance on snapshotted data. They need data consistency and data in a format which is read-mostly and read-frequently. This sounds like the perfect scenario for a near cache', where we keep frequently-accessed data local to where it is being processed (in-memory or local-host). Combining a scale-out NoSQL Layer with a near cache gives a synergistic, scalable data platform on which we can service the most intensive applications.
…And so Mache was born.
So what is Mache?
Mache is a NoSQL near cache with eventing; built from a 'mash-up' of Open Source technologies with multiple pluggable NoSQL support and multiple pluggable messaging platforms. We have used all the well-known Open-Source technologies. The heart of the system is akin to a HashMap - using Google's Guava cache and Spring Data.
The principle idea behind Mache was to abstract itself from the Database layer. As a result, it allows the two to remain somewhat independent of each other, enabling concentration on development and performance rather than scaling out the data layer. It has been engineered to be as lightweight as possible, and because of the Vendor independence, Mache could offer a free caching solution to an already deployed NoSQL database. It doesn’t need to know about the volume of data which means that it can be as performant as the system allows.
It was also designed to be a piece of software which integrates well with other software. It offers support for NoSQL programmes such as Cassandra, MongoDB and Couchbase. But it also lends itself well to support integration with pre-existing systems such as Kafka, RabbitMQ and ActiveMQ message services.
What Use Cases do you see it being used for?
Generally speaking, most grid applications will have a heavy reliance on read-mostly data. More relevant to today’s challenges are MiFID II and CVA.
Since the introduction of the MiFID II market makers must warehouse the audit data around their prices and spread calculations. This inevitably leads to the storage of large volumes of data, which also offers the opportunity for Big Data. Applications now need to read and write this data, and we believe Mache offers the opportunity to help turn a ‘data lake’ into a localised manageable view of this data deluge.
Take for instance, today’s trades or their price points. Mache fits naturally within the speed layer in a lambda architecture.
Credit Value Adjustment (CVA) requires large volumes of Monte Carlo data, and in some cases will need 300-400GB of data for a single day. When stress testing or back testing is being performed, you could need 30+ days’ worth of data on each compute node. 400GB * 10k = a lot of data!
Is Mache relevant to cloud?
Definitely. Cloud is being adopted in the IB space and grid-computing is the first candidate for moving to cloud. Grids are distributed by nature and consume a majority of the datacentre space, so they are a good fit. Products like Cassandra work in tandem with the increased challenge on networking infrastructure. NoSQL platforms are pushing IMDG's aside because of they lend themselves better to horizontal scaling and larger-scale architecture.
How does Mache compare with other existing technologies?
The following is a list of technologies that can fulfil similar use cases to Mache. While they are all leading technologies in their field, it is important to note how they are distinctly different to Mache.
- Ehcache - is a distributed JSR-107-compliant cache that has pluggable storage mechanisms (on-heap/off-heap and file-based). However it doesn't have out-of-the-box support for any NoSQL platforms.
- Gridgain - is a purely in-memory data and processing grid platform. It has no current support for NoSQL platforms.
- Hazelcast - is another in-memory grid implementation, often marketed as a replacement for Oracle Coherence.
- Infinispan - is a single node or clustered JSR-107-compliant in-memory cache implementation. Its persistence is handled by pluggable adapters the application teams may have to write themselves. It provides no NoSQL support as standard.
- Memcached - is a multi-node object cache with no persistence mechanism as standard.
- Redis - is an all-purpose in-memory data store with no persistence mechanism as standard.
In our view, there was no pre-existing solution that would provide for the simplest use case of near caching to abstract away storage whilst still supporting the big 3 platforms. We proposed a solution that would adhere to the following guiding principles:
- Lightweight - the Mache-core jar depends on 3 other libraries (in addition to any required vendor drivers for messaging or persistence)
- Pluggable with existing NoSQL Platforms, and thereby creating no vendor lock-in
- Provide a small set of valuable features, adhering to the Unix principle of 'do one thing, and do it well'
How well does Mache scale?
We have tested Mache using Apache JMeter which has some great plugins for measuring throughput. The primary test was to check read performance for varying volumes of data with the caching turned on against those read actions and without caching. As can be seen from the charts below, Mache performed very well.
We built the test environment within AWS, using m3.xlarge instances, and as expected a three node DataStax Cassandra AMI performed well with a random read test.
However, introduce Mache and we see a higher performance and scale. This is expected as local memory lookup will always beat a network hop.
Finally, we then tested Mache with 'Inter-cache' notification. This is when data is frequently changing and caches have to keep up with the churn. We deployed Kafka/Zookeeper as a single node (m4.2xlarge) but even in this mode, Mache was able to sustain fast responses at <100 changes per second>
What platforms does it run on?
Mache currently supports the following NoSQL Platforms:
Additionally, Mache supports the following messaging platforms:
So what's next for Mache?
We have quite a few enhancements on our list. In no particular order:
- JMX to provide configuration and management
- Overflow cache - so we can overflow from memory to disk using something like Berkley DB
- JSON/Rest API - to provide a platform agnostic way of accessing local cached data which is run in a service.
- Eventing straight from the database
- Filtered materialized views/Continuous query caches
- Shared cache instance between processes
- Distributed file system
Where can I go for more information?
The source is available
The binaries for 0.6.1 are available
Alternatively, email us at firstname.lastname@example.org
1. What's the desired behaviour when one of Mache clients cannot consume an invalidation event, for example due to network partitioning? Kafka is a great choice as it journals. It is important the messaging broker supports a 'at least once' delivery...if we receive a message more than once (due to a server/network failure) then the cache will simply evict the data a second time a refresh from the storage layer.
2. For AP types of data stores like Cassandra, can there be any conflicts with its eventual consistency approach? In this context, would pushing updates via queue help? Unfortunately, Mache won't solve the problem of eventual consistency with simultaneous updates, the last update will always win. The problem will always remain where you have multiple processes (Mache or otherwise) updating your data. However, we have planned in our roadmap a feature for using the NoSQL stores' own eventing mechanisms, to bring the order of events much closer to the order the database executes the transactions.
3. Is there a need to ensure the ordering of events? Thankfully not, since events are only ever invalidation events, the underlying NoSQL platform takes care of that for us, as the recipient of an invalidation event will just re-query the database.
4. Large number of r/w clients could cause an event storm. Can Mache protect itself from being starved by incoming events? If in memory queues are used for buffering events, is there a danger of OOME?
Mache is intended for a read heavy workload that is typically seen within a grid environment. Though we have built-in eventing (via queues) to support evicting cache data on change it is for edge cases for occasional data eviction. Also event filtering/conflation is likely to be in an imminent release which should give some level of protection.
5. Can Mache instances subscribe to a filtered event stream? Can events be batched or, even better, conflated? The only filtering is based upon the entity being cached (each entity gets its own topic). There's currently nothing within Mache for filtering out events. However, we have filtering/materialized view support on the roadmap.
6. Does front cache support TTL/max size? Yes and No. At the moment there is no way to pass down any eviction and max size options into the cache and instead have set a default max size of 10,000 and a default expiry of one day after the entry was last read/written. It is our intention to expose these options to the client.