There were lots of delegates who are (or potentially) interested in this subject as more and more enterprises want to get a better insight on the data they own in order to improve their business and serve their clients in better way.
Undoubtedly “Big Data” has become a buzz word and there are plenty of commercial IT solutions that help customers in gathering, processing and aggregating vast amount of data, potentially in different formats, to get an overall view of it and to bring some value to the business.
Information management is about using information to make a decision and that decision-making requires effective information that is delivered to the right place, at the right time and to the right person(s), so there is a need for precision to get the correct information.
It is fundamental that there is an understanding of data to make such decisions and Big Data brings lots of challenges that are not easy to tackle; there are 3 main ones which are size, scope and speed.
The first challenge, size, is the most obvious one as the amount of data can be very daunting especially when data size grows continuously. Also data can be of different formats which makes things even more challenging when you need to aggregate it.
The second challenge, scope, is regarding what you want to get from the analysis of data and what kind of information you want to extract for your business.
The third challenge, speed, is subjective to one’s expectations on processing large amount of data and potentially some SLAs in place. It is important to process data very quickly to get valuable information on time in order to be flexible and agile in the business.
Nowadays, various companies do their business based on Big Data, especially Google, Amazon, Netflix and Facebook, just to name a few. Interestingly, more enterprises in different industries (mobile, gaming, telcos, marketing, retail, insurance, banking) are looking at the potential of Big Data so there will be more opportunities to build tools and develop skills that will be required for the challenge.
After all, the value of Big Data is to explore new ways of processing data and find the bits that are we are interested in. Definitively Big Data is going to be disruptive for information management systems therefore enterprises that intend to embark on this challenge need to mature in its understanding of the value and role of information.
The Hadoop ecosystem
Hadoop is based on the map-reduce paradigm which has been designed by Google and it is explained in detail
Hadoop is part of Apache ecosystem, entirely written in Java and it is free to use. There are commercial software solutions based on Hadoop with some customisations and enhancements. Major software companies like Oracle and Microsoft provide software solutions for Big Data based on Hadoop like in the
Alongside with Hadoop there are other open source projects related to it and the most important ones are Hive and HBase.
Hadoop is not an out-of-the-box solution that you can simply deploy and start to use but it is a framework that you can leverage on to build your Big Data processing logic and get some information from it.
The Oracle approach
Oracle offers a broad portfolio of products to help enterprises acquire, manage and integrate Big Data with existing information, with the goal of achieving a complete view of business in the fastest, most reliable and cost effective way.
Oracle offer an engineered system of hardware and software called
Another product they offer is
It is worth mentioning that Oracle also provides
Oracle is undoubtedly investing a lot on the Big Data revolution and it is at a good position to be the major player in this arena. Oracle experience comes from databases and structured data, therefore this innovation from Oracle is seen as a natural progress.
Hadoop has been in the open source community for a few years and it is now getting a big momentum as more IT providers are using it either in their software or cloud solutions.
At the moment it is a fundamental piece of the Big Data puzzle in order to analyse large amount of raw data at low level.
Big Data management can be very complicated so when selecting tools for data analyses there are few considerations to take into account:
- Where will the data be processed? Locally-hosted software, dedicated appliance or in the cloud?
- From where does the data originate? How will that data be transported? Often it is easier to move the application close to the data (data affinity) so to avoid network latency.
- How clean is the data? Variety means that it needs a lot of cleaning and that costs (time and money).
- What is your organisational culture? Do you have teams with the necessary skills to analyse the data? Analysing data needs the creative ability to look at problems in different ways.
- What do you want to do with the data? Having some idea about the outcomes of the analysis may help to identify patterns in the data.