Journeys in Data Science Part I: Starting Down a Path to Creating Your Own Amazon Go!

Don’t you just hate paying for things? Well Artificial Intelligence can help you avoid doing just that! Recently Amazon introduced their new "Just walk out technology", with Amazon Go. The AI facilitated system incorporates the use of an app to create your virtual basket and a physical shop where you can go to collect your items. RFID chips mean that the system dynamically updates determining which items you actually picked and placed in your basket. You can then walk out the store without queuing, whilst your money is automatically deducted from your Amazon Go account. Okay, so you do have to pay for your items, but in this first in a series of blog posts on Big Data, we’ll explore how Big Data in combination with advancements in AI and Machine Learning are poised to transform not only how we shop but how companies do everything from the design, manufacturing and distribution of those products to the customer.

Even with the incorporation of AI and Machine Learning into everything from our shopping experiences to our connected cars to our mobile phones, here at Luxoft our team continually encounters some fundamental questions and challenges when being approached by clients who are eager to embark on their own foray into AI, Machine Learning and Data Science for advancing their businesses. These challenges seem to be common to enterprises across all sectors, and range from: a misunderstanding of fundamental principles; acquiring appropriate stakeholder support; an appreciation of developing a data driven culture; managing privacy and compliance, as well as funding and finding the appropriate expertise to help in achieving the required business objectives.

In this first of a series of posts, discussing how to develop your Data Science initiatives, I will explore some of the common questions we hear often in the early stages of the Data Science journey:

  1. What is Machine Learning and Data Science?
  2. What can you do with it?
  3. How can my company leverage it?
  4. How much is “_____” going to cost me? The blank is either; training, infrastructure, expertise or all three.
  5. How long will it take to realise benefits?
  6. How will this impact my existing business with respect to workforce, operations and infrastructure?
  7. What will this ‘magical’ solution to all my problems look like?

1. What is Machine Learning (ML) and Data Science?

A description of Data Science, ML, Big Data in terms of trends in business via the Gartner Hype Cycle can be found here. However, in equipping you with some pragmatic information going forward, I offer the following definitions.

  • Artificial Intelligence: Field of study concerned with the intelligence of machines, usually as replicating human intelligence.
  • Machine Learning: A process by which a computer learns without being explicitly programmed. The process is data driven. Algorithms are used to find patterns in data which are then used to infer future patterns. It is this particular field which encompasses the mathematical techniques, that equips data scientists with their predictive 'super-powers'.
  • Big Data: Is defined by its properties in terms of ‘volume’ – amount of data, ‘velocity’ – rate at which analytics is implemented, ‘variety’ – structured and unstructured data. The term includes technologies used to aggregate data, process and store data.

2. Data Science:

An amalgamation of disciplines including; statistics, data mining, computer science, machine learning, modelling, Big Data technologies, data visualisation and many more… A primary objective of Data Science is to extract actionable insights from data.

3. What can you do with it?

The list of possibilities with Data Science is extensive. As long as your analytics are about ‘predicting the future”, then you are doing Data Science. Are you doing Data Science? If you are being retrospective in your analysis, this is Business Intelligence.

Below is a showing some of the key applications of Data Science and ML by sector. This is not a complete and exhaustive list, and some applications such as ‘customer segmentation’, ‘cross-selling, ‘demand forecasting’, are found across multiple sectors but have not been included for repetition

  • Propensity to Buy
  • Demand Forecasting
  • Process Optimisation
  • Predictive Maintenance or condition monitoring
  • Warranty reserve estimation
  • Telematics
  • Predictive inventory planning
  • Upsell and cross-channel marketing
  • Market Segmentation and targeting
  • Customer ROI and lifetime value
  • Recommendation engines

Healthcare and Pharmaceutical:
  • Alerts and diagnostics from real-time patient data
  • Proactive healthcare management
  • Disease identification and risk stratification
  • Patient triage optimisation
  • Healthcare provider sentiment analysis

Travel and Hospitality:
  • Social media – consumer feedback and interaction analysis
  • Aircraft scheduling
  • Dynamic pricing
  • Traffic patterns and congestion management

Financial Services (Retail Banking):
  • Cross-selling and upselling
  • Customer segmentation
  • Sales and marketing campaign management
  • Risk analytics and regulation
  • Credit worthiness evaluation

Energy and Utilities:
  • Carbon emissions and trading
  • Power usage analytics
  • Seismic data processing
  • Energy demand and supply optimisation
  • Customer specific pricing
  • Smart grid management
Agriculture and Farming
  • Prediction of weather patterns
  • Optimisation of soil composition and health
  • Elimination of pests and weeds
  • Disease trends
  • Maximisation of crop yields
  • Disease resistant breeds
Financial Services (investment)
  • Cyber threat analysis and defence
  • Fraud detection and risk analysis
  • Real–time analysis and alerting
  • Regulatory compliance
  • Data governance

Preventive Maintenance. A major airline tasked Luxoft with helping them reduce maintenance costs, downtime and negative customer service impacts due to unexpected maintenance issues with their fleet of planes. Luxoft helped the airline gain insight from both historical and real-time performance data to develop a model for predicting maintenance needs of planes by type, usage and other variables.

4. How Can My Company Leverage It?

"A problem well put is a problem half solved."– John Dewey (1850-1952: American Author, Philosopher and Psychologist)

The most critical thing to do before embarking on a potentially expensive foray into Data Science, is to ensure you understand what it is you are trying to achieve. What you are about to embark upon is a journey that is an interplay between analysis of data and business objectives. Consider the following:

  • What business problem(s) do we want to solve? W
  • ill data we have help me solve this problem or do we need to collect more and from where?
  • If we cannot acquire more data, what is potentially attainable with the data sets we have today?

Through answering these questions with your team, you’ll be better equipped to break down your business problem into the right Data Science problem and begin work on defining the best solution. The process is iterative. The business question will help guide exploration of the data, the data will then help you refine and focus the business objective you’re solving. This in turn will help you manage expectations with regards to insights you expect from the data. Finally, having implemented the processes necessary, will have a baseline by which you will be able to validate your results.

For most organisations, this initial exploratory stage can always benefit from external guidance. Whether you choose to work with an external consultancy like Luxoft or not, always ensure your collaborative partner includes some kind of ‘Proof of Concept’ in their ‘scope of work and services package’. This usually involves defining and delivering a solution to the highest impact Use Case. After all you don’t want to hire people to simply give you greater clarification of the problem you have, you want to actually solve your problem. The service should include workshops, and processes to evaluate and validate the work achieved in a fixed time frame. This will ensure you take on minimal risk and drive good ROI.

5. How much is it going to “_____” going cost me? - The blank is either; training, infrastructure, expertise or all three.

A project estimate can be calculated once the team has a clear understanding what exactly it is setting out to achieve. At this point, you’re all thinking “she’s teaching us to suck eggs’’ but it’s amazing how the majority of organisations forget that this is a research and development project management process, which can be managed under Agile methodologies or similar. Note I have not suggested that you go immediately hire data scientist and build your Hadoop cluster! Do you want to use open source technologies? Are you able to? And do you really need a Data Analyst or Software Development Engineer? If so when?

It must be understood that Data Science is not software development. It is research and development, although the processes are complimentary. This is useful to appreciate since many Data Scientists help build data products, which often manifest as software. Consider enlightenment from the hallowed principles of CRISP DM – Cross Industry Standard Process for Data Mining, which give guidance on managing this process.

In addition to this organisations must always remember that research and development by its very nature will lead to dead ends that may not directly benefit the overall outcome of the project. This will obviously have an impact on scope, cost and time.

6. How long will it take to realise benefits?

In short this depends on the difference between where you are and where you want to. A gap analysis must be carried out. At Luxoft, we focus on defining what those benefits are to starting a new initiative, we clarify the pros and cons, discuss potential outcomes, ask how an organisation is seeking to validate its investment, and of course, over what period of time stake-holders want to realise the ROI.

7. How will this impact my existing business with respect to workforce, operations and infrastructure?

Some extra time spent on considering the following questions will help your organisation avoid some unnecessary headaches:

  • With regards to infrastructure, when considering our Data Science initiative, can we solve your problems through utilising open source technologies or will we require proprietary tools?

Open source is of course cheaper than proprietary software but not all resources and tools have support tied to the products beyond forums. Do you need this?

  • Do we want to keep our data onsite or do we want to utilise cloud resources?

A cost and security issue. At Luxoft we have noticed this is somewhat of a non-trivial topic when considering multinational organisations that often deliver multiple products and or services.

  • Does our organisation understand the value of its data, and respect it accordingly?

This has a lot to do with the notion of ‘data driven culture.’ Think about it this way. Your future business decisions are going to be based on the data you have gathered and the information contained therein. It is that which will help you generate revenue, or cut costs, or both. It is as valuable to you as your bank balance,

  • Are we compliant? Consider privacy, security, protection – within the organisation, between the organisation and its clients, as well as relationships with 3rd party actors.

The reality is that, government policies and legislation around data management is having serious problems keeping up with the ever-evolving data landscape. The impact of security breaches stand to become ever greater as the amount of data aggregated, processed and managed continues to grow. Without guidance, organisations need to take time to seriously consider the what their primary concerns are for their business, then future-proof themselves, and their consumers against fraud, theft and sheer carelessness. Multi-nationals should consider the impact of GDPR – General Data Protection Regulation which will come into effect in 2018. Don’t be deluded into thinking this is a long way off, ensuring your organisation is often a time consuming process… and the fines are steep.

8. What will the ‘magical’ solution to all my problems look like?

I hear about ‘magic’ in reference to Data Science and ML more and more. Data Science follows the same rigorous process as any scientific discipline. Data Scientists have systematic and methodical approaches to implementing creative solutions to the problems businesses face. It really is a combination of art and science informed by domain expertise and scientific experience. There will be no singular solution to your problem, but advice and information offered regarding future outcomes to decision makers which are firmly rooted in facts. These outcomes are governed by quality of data, the methods of approach taken to the problem, a number of variables unique to the organisation, as well as capabilities and resources of the analytical team.


Launching into a new Data Science initiative isn’t a decision that organisations take lightly, but don’t expect to boil the oceans. It is a long game with numerous iterations, that should evolve with the needs of the business and help drive its objectives. Businesses are no longer in completely uncharted territories. If you are ready to embark in a new adventure, then there are guides which will help you on your way. In our next post we shall look at questions that will take you beyond the fundamentals and help you consider some more of the “nitty gritty” concepts that can help you plan your Data Science and AI initiative.
Maya Dillon
A Data Scientist with a passion for AI and Space. She is focused on developing strategic initiatives of businesses – implementing both new technologies and scientific methodologies. She is an experienced and highly acclaimed public speaker not to mention a science and technology evangelist. She is passionate about creating and developing new initiatives incorporating cutting-edge technologies in AI, Machine Learning and Data Science.