Today, big data applications are everywhere. Among them are recommendation systems that have already been around for over a decade. These systems make applicable guesses – sometimes unexpected but in most cases fairly accurate – on movies, music or products a user would like. So here’s the lowdown.

How can big data help us find films we haven’t even heard of, but are likely to love?

Some time ago, researchers discovered that a group of people will independently shop for a certain product and then, when they come later to buy something completely unrelated to the first purchase, end up buying the same second product.

Such observations – combined with the growing need to guide user navigation in stores with tens of thousands of items – led to the active development of systems that predict users’ buying behavior and make relevant purchase suggestions. Today, we call them recommender systems.

Here is exactly where big data comes into play – it finds patterns and connections that can’t always be explained on their own. And as datasets become larger and more multidimensional, companies can discover interesting correlations and offer more precise recommendations.

Coming back to films…movie theaters, internet providers, TV channels (such as YouTube) and Movie renting services are the drivers of the spread of recommendation systems in the film industry. And these systems tend to not focus on user groups (market segments), but on individual users.

How do recommendation systems work?

One of the main methods is collaborative filtering – making automated predictions (filtering) for what interests a user by collecting the preferences of many users (collaborating). It’s assumed that if person A has the same opinion on one thing as person B, A is more likely to have B's opinion on another thing – or at least more likely than a randomly selected person.

Collaborative filtering applications typically involve very large data sets. As in all big data analytics, the predictions from recommendation systems are specific to a user, while collecting information from many (anonymous) users.

These systems do their job in two steps: first, by finding users who share the same rating patterns with a targeted user. And second, by using ratings from likeminded users to calculate a prediction for the targeted user.

Another common method for recommendation systems is content-based filtering, featuring algorithms that recommend items similar to those the user liked in the past. This has a weaknesses however, since the algorithm can’t show a user new (and unknown) product areas the user might also like. As a result, hybrid filtering systems are often used instead.

The real-life impact

Netflix, a television and film streaming service, employs more than 300 staff and spends over $120 million yearly solely to support and develop its recommendation systems. The company even held a famous open competition from 2007-2009, referred to as the “Netflix Prize,” to find the best collaborative filtering algorithm to predict user ratings for movies solely based on previous ratings by anonymous users.


The goal was to improve Netflix's own algorithm, Cinematch. And if a new algorithm improved the old algorithm by more than 10%, the winner would receive a grand prize of $1 million! To start, Netflix provided a training dataset of 100,480,507 ratings from 480,189 users on 17,770 movies for participants to use. Each training rating contained four elements: user, movie, grade and grade date, with grades rated from 1 to 5 stars.

Netflix provided a qualifying dataset, which contained over 2,817,131 ratings with grades known only to the judges. Every submitted algorithm was applied to the same qualifying set to predict grades, which were then compared to the known real values, calculating the RMSE (root mean square error). The submitting teams were only informed about the results from half of this set (1,408,342 ratings) in order to avoid possible data manipulation – such as gaining a local minimum on the test set at the expense of a global optimal solution (a technique known as hill climbing).

In the end, the team BellKor's Pragmatic Chaos – a group that combined a few participating groups together, as no single team succeeded – came in with the winning submission just 24 minutes before the conclusion of the nearly three-year-long contest. The new system improved results by 10.06%, with a RMSE of 0.8558 (meaning there was little error in comparison to the existing Netflix algorithm).

The realization

The Netflix story offers a realization: the power of algorithms boosts performance levels, producing real business value.

Today, recommender systems represent a very practical and proven way to work with large catalogues of data. Users can find whatever they desire – even when not knowing exactly what they desire – and in some cases, without even realizing certain products they like exist.

As recommender systems develop further, we’ll see new algorithms offer even more powerful and personalized recommendations. We’ll also have higher quality data and new emerging technologies available, such as deep learning via neural nets.

Interested in learning more? Luxoft has numerous data experts that can answer your questions. Be sure to contact us here.
Andrey Povarov
An IT manager with a wealth of experience developing and implementing business expansion strategies, launching new products on global markets and running country operations for international companies. He is passionate about adopting emerging technologies as well as designing and running premium training programs. He has both a technical and managerial background, a PhD and an MBA.