On the 1st March, Excelian attended an all-day event organised by UK Oracle User Group called Oracle Coherence SIG (Special Interest Group), the bi-annual get-together event for people with interests in Coherence. The event had several sessions talking about different aspects of Coherence, how to use it in a correct way based on scenarios and how to perform some optimisations.
These sessions were held by experienced engineers (some Oracle engineers and software architects from finance) working with the product for years.
The web page of the event with the list of all the sessions and slides in PDF can be found here.
A different approach to distributed joins in Coherence
This was the 1st presentation of the day, held by Andrew Wilson, an experienced Coherence architect from RBS.
The key discussion was on the execution of distributed queries on very large clusters (an example RBS built a cluster with 400 members, 8091 partitions and 7 services deployed, with data in the region of the Terabyte) without compromising speed and availability. It is very easy to see Coherence as an in-memory data store that manages and distributes data for HA and low-latency access, but when it comes to querying data it is fairly easy to make simple mistakes that can affect the performance of a cluster.
Let’s consider a cluster with several partitioned caches with services attached to them and the need to run a simple query like select * from ‘cache’ where id = ‘?’ which, because of the cluster partitioning, can actually become quite a complex query. By executing the query on a Coherence cluster, each member will execute it in order to find the data object that needs to be retrieved and only one member (the one which holds the primary copy) will return the result (the data object) while all the other members will not return anything.
Using this technique is not performing well on large clusters when specific data has to be retrieved. Another approach is to leverage the DistributedCacheService to retrieve information like the partition containing a specific key (getOwnedPartitions()) and the partition on which the member is hosted (getKeyOwner()). This is possible because a Coherence cluster knows which data each member owns by the key of data objects.
Once we know on which cluster nodes the targeted data is, we can then use the InvocationService to run specific tasks (synchronously and asynchronously) on those members to perform some logic on the data and return a result.
Another point discussed was the difference between the “Entry Processors” and “Aggregators” and how their misuse can affect the performances. The aggregators are definitely faster as they perform the reduce phase and this implies that a client receive only one result that is sent over the network.
In general the rule is to use aggregators to aggregate large data set on the Coherence cluster to avoid sending large amount of data to the client and use processors to update large data set on the cluster directly instead of doing that on the client.
Please refer to the slide to see more about the differences between the two (see link below).
All in all the message is “think of Coherence in terms of cache, partitions and services” and leverage the Coherence services to make fast and directed queries when necessary and reduce the network data transfer as much as possible.
This presentation was held by Credit Suisse architects (David Whitmarsh and Phil Wheeler) and the purpose was to give tips and tricks on correctly setting up a Coherence cluster in Production, based on their experience and experimentations.
They started with simple settings like name your production cluster to avoid any undesired member join from another Coherence instance (btw, you can do the same by changing the multicast port of your cluster as well); also they advised to set up the memory for Coherence upfront instead of having the JVM allocating more and more memory as the data grow.
Several garbage collection settings were discussed as the stop-the-world pauses can seriously affect the performance (after all a Coherence instance is a Java process) and the key message is that it is important to profile the application in order to tune the garbage collector accordingly. Also they explained how to size the Java heap memory for Coherence by showing a mathematical formula and based on their experience. (Please refer to the slide for more details, at the link below)
Another key point is to avoid swapping as it is known to slow down any system. Credit Suisse used a technique called “RAM pinning” to pin the memory for java processes so that their memory was not subject to any swapping. They used a C library called “mlockall” which locks a process’s virtual address space into physical memory and they integrated in their Java code through JNA.
The presentation was very enjoyable as there were some intriguing quiz questions and it was interesting watching other people answering based on their experience.
“LittleGrid” – helping you test your Coherence code
LittleGrid is a simple open-source test-support framework that allows to start, stop and shutdown a Coherence cluster in a single JVM. It can be very useful for quick testing during the development phase without having to set up a full cluster.
Having the possibility to run a little cluster in a single JVM, allows debugging the code more easily and this framework can be used in any IDE and in any continuous integration tool. Obviously it does not substitute any load testing in a properly configured cluster and it is not a replacement test framework as it is typically used with any test framework.
The presentation was held by Jon Hall, an Oracle consultant and it was very interesting to see his demos. It only requires adding a JAR in addition to the Coherence one and starting it through the IDE; it requires using ClusterMemberGroupUtils which makes use of the builder pattern to set the number of members in the cluster with the storage enabled, the number of extend proxy clients and the number of JMX monitors.
This framework is still under development by Jon Hall and the source code can be downloaded at www.littlegrid.org. The documentation is still light at the moment as it is work in progress but the APIs are very simple and intuitive.