It may not be news that the use of Hadoop technology has significantly increased due to Hadoop’s inherent ability to integrate and process large amounts of heterogeneous data sources and identify behavioral patterns. But until recently, Hadoop did not work well with mainstream analytics platforms and was largely a subsidiary or niche solution that complimented more traditional RDBMS and analytical solutions.

Financial Services

For the financial services sector, Hadoop technology has proven particularly attractive for driving activities such as fraud detection, as well as more traditional customer experience improvements.

During the process of refining and distilling wine and spirts, it is accepted that some of the product will be lost in the process. This loss is called the “Angels’ share”.

Traditionally, Hadoop would map and reduce activities to refine the data shared. This characteristic is known for retaining a long term record and comprehensive visibility into all of the transformations and changes that occurred.

However, in the last 12 months we have seen two significant changes in the market.
  • Hadoop is becoming a central part of the corporate infrastructure and no longer just a niche element, in large part due to the combination of data volumes with the need to rapidly process an increasing variety of data sources in order to maintain a competitive advantage and meet the increasing demands of customers for interactive experiences.
  • There has been a significant increase in financial services regulatory requirements (such as BASEL) that impact data retention, auditing, and lineage capabilities. The impact of new legislation is extensive and includes new corporate positions such as “Data Protection Officers” as well as requirements to provide customers with details about who has accessed their data.
The combination of these two factors has led to some very interesting solutions entering the market. These solutions fall into three broad categories:

  • Solutions from the major Hadoop distributors, such as Cloudera Navigator and Hortonworks Dataflow.
  • Solutions from traditional relational data processing providers, such as Informatica and Teradata.
  • Solutions targeted specifically at the Hadoop platform, such as Waterline Data and Big Data Revealed.

While many of these solutions have some common capabilities, they also have some distinctly unique features that may make them more compelling for certain scenarios.
Data governance and lineage is not a new subject, but its application in the “Big Data” world is certainly not well defined. What is evident at the moment is that with mainstream adoption of Hadoop now well underway, this is a critical piece that has some emerging tools that are maturing rapidly.

Something else that is evident today is the fact that there is no single solution that covers all legislative and governance needs and certainly no single, “out-of-the-box” product which will be able to guarantee that the “angels” aren’t getting their share of the data.

The good news is that Luxoft is working on real-world applications with many of the leading players in this space, and over the coming months I will be going deeper into the specific merits and features of some of these key tools. Keep watching this space!
Alex Tilcock
Alex Tilcock is the lead solution architect for Luxoft’s Big Data Practice. Alex has a passion for data-driven technology, technical innovation, information architecture and strategic solutions that work in today’s real-time data environments. Alex has 30+ years’ experience in large-scale systems integration and commercial software development, with a major emphasis on data. He has been a key contributor on a diverse set of solutions, from simple desktop applications for a few users to global government real-time intelligence systems. Alex also has extensive experience and certifications in open source and Hadoop stacks.