Excelian recently attended a partner briefing for ParAccel, a company whose products endeavour to create an “Analytics Driven Enterprise”. The analytics driven enterprise is one that acknowledges the existence of the analytics gap, the difference between the amount of data collected and the amount of data that is analysed and transformed into information, and is working towards a solution to bridge the gap.Their main product, ParAccel Analytic Database (PADB), is a schema neutral, distributed, column oriented relational database management system. PADB can work as a wholly in memory solution, or in the more traditional physical storage model. The system can be configured with hot-hot redundancy for the compute nodes, can effectively implement a distributed RAID for data redundancy and also handle data utilising a SAN as the database of record to cope with disaster recovery situations and leverage all the other SAN benefits like cloning and snapshotting. The platform also has the ability to perform calculations directly through C++ UDFs, a customisable and user definable set of on demand integration (ODI) adaptors allow seamless integration with other big data technologies and the platform leverages commodity hardware.
ODIs allow PADB to sit in the middle of multiple data sources, including a traditional enterprise data warehouse, other relational databases and also providing interfaces for other big data applications such as Hadoop or Teradata. PADB can then be used to aggregatbe data from all sources, providing a one stop shop for querying disparate data stores as a coherent whole.
ODIs themselves are built upon the UDF framework, and hence new adaptors for any data source can be implemented, allowing PADB to integrate and interoperate seamlessly within any organisation. Scalability is also of huge importance, with PADB able to scale to large clusters and capable of running many ODIs, UDFs or complex queries simultaneously.
The UDF framework also allows for complex analytics to be performed directly by the PADB engine. ParAccel provide over 500 off the shelf analytics functions including financial analytic libraries allowing for easy implementation of VAR calculations, market and credit risk and also providing easy aggregation of results to any level. The UDF capabilities of PADB could be used to leverage existing C++ analytics, wrapping them in a way that the PADB engine can execute directly, very closely to the data. This could significantly minimise the data transfer overheads required, reducing the overall time taken to complete complex analytics on large data sets.
So ParAccel have an offering which on paper ticks many boxes and has some impressive features. Performance is one of their main strongpoints, and ParAccel are confident that they can beat the performance of any other competing vendor solution, often with a simpler implementation process. Amazon has certainly taken the plunge and is leading the way by both investing and utilising ParAccel’s offering for their cloud based big data platform, RedShift. However, with RedShift, Amazon is attempting to bring big data to the cloud with a solution that looks more geared to warehousing big data sets than ParAccel’s dream of big analytics.
MapReduce (EMR), as do Microsoft in Azure. Both of these Hadoop solutions can be used for either analytics or data warehousing.
RedShift sounds like it will make full use of PADB’s ability to run exceptionally fast queries up
on large datasets, but leave out the ability to run custom UDFs in the cloud alongside the data. It would seem th
en, for those wanting big analytics in the cloud, RedShift may not be the first choice, despite its PADB origins. Amazon also offers Hadoop in the cloud through Amazon Elastic
Microsoft have taken the approach of offering a Windows based version of Hadoop, and providing an Apache Hive connector to allow other Microsoft products and services such as Excel, SQL Server, PowerPivot and Power View to readily communicate with and access data within Hadoop. Another advantage of the Apache Hive ODBC connector is that 3rd party applications built using the Microsoft software stack can instantly benefit and access data stored using Hive. The ability to leverage existing 3rd party applications and well known user tools, whilst exposing the unstructured data that can be stored within Hadoop as a data warehouse will likely appeal to many organisations.
With EMR, Amazon offer Hadoop running on their Elastic Cloud Compute (EC2) and Simple Storage Service (S3) platforms. This is a more direct competitor to Microsoft’s Azure Hadoop product, but there may also be overlap within the capabilities of EMR and RedShift. As to which will ultimately be most suitable for any particular big data problem, that would depend on the problem at hand. With RedShift, Amazon is bringing another capable cloud platform to the table to tackle currently unanswered, and possibly unasked, questions of the future.
The financial services sector constantly faces new challenges within the risk space, especially to meet new regulatory requirements. This is a huge driver for additional compute against ever higher volumes of data. Traditional solutions can often reach their limits and new and innovative solutions are being considered to solve these bigger problems. Excelian and ParAccel have partnered in this domain to bring Excelian’s wealth of both capital markets and high performance computing expertise in order to help architect and implement PADB based solutions to fit the requirements of the industry.