View on GitHub

INSEAD Course:
Data Analytics for Business

T. Evgeniou

Professor of Decision Sciences and Technology Management,
INSEAD

Download this project as a .zip file Download this project as a tar.gz file

Some quotes from past participants

"I work for an alternative asset management firm. The project/code I did at INSEAD on systematic investment strategies as a follow up to the Data Analytics class was the most challenging, but also the most rewarding experience during my MBA. The course is pivotal for everyone who wants to improve their analytical thinking and skills."

Arne Uekotter, INSEAD MBA 15J

"I am working in BCG, and R and statistical techniques that we developed in class are extremely useful. My message to all consultants is: learn R as quickly as possible. You will be given the most interesting modules on the projects. You will be able to do stuff which no colleague in your office is able to do!"

Pawel Godula, INSEAD MBA 15D, Ford Prize Winner

"This course helped me to define the services that I am currently offering at our company, it's a must for anyone who wants to do business in the next 30 years. Learn R!"

Jes??s Mart??n, Data and Analytics Director at The Cocktail, INSEAD MBA 15J

"I'll get to launch the analytics practice for the fourth-largest company in the industry in the world -- all thanks to the seeds you planted in the course in 2014, and during the follow up ISP."

INSEAD MBA 14D

"The course is excellent cause it shows what is the current state of the art in data science. I work in a hedge fund and one by one I applied most of the techniques and tools taught in the course in my job."

Maciej Gorgol, INSEAD MBA 15D

"A must-do course for any person involved in decision making based on data. This course gives a hands-on introduction into big data analytics, what's possible, what are limitations. It's a pain but you actually learn a lot, much more than in any high level fluffy big data talk..."

INSEAD MBA 16J

"I was a lawyer before so I had no background in statistics/programming/data analytics or any of that kind. Yet, I really walk away from your class smiling."

INSEAD MBA 16J

"This course is a must have for MBAs"

INSEAD MBAs

Course Description

"Another thing I must point out is that you cannot prove a vague theory wrong. [...] Also, if the process of computing the consequences is indefinite, then with a little skill any experimental result can be made to look like the expected consequences."

Richard Feynman

"I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk."

Enrico Fermi

The main goal of this class is to equip you with better skills for analysing data and understanding the outcome of such analytics for the purpose of making more sound business and investment decisions.

We will discuss applications of data analytics in a wide range of business cases, from Finance, to Marketing, to Operations among others. We will also cover statistical and machine learning techniques such as factor analysis, cluster analysis, and discriminant analysis. A pre-requisite for the couse is the material covered in the INSEAD core course Uncertainty, Data & Judgment or an introductory course on Statistics for Business. The course has a heavy "hands-on" flavour: we will analyse datasets using open source data analytics software tools - which you will be able to use in your jobs. The tools are based on the R language. As we will be using these tools during class, participants are required to come to class with a laptop having all tools installed (see information below).

What you will take away from this course:

Upon completing the course, you will, among others:

Course Tools

The course is using two main tools: Rstudio and Github. All participants are required a week before the first session to install Rstudio and also setup a github account where they post their work. These are the Getting Started Instructions to follow before the begining of the course.
A few more technical resources (some may be outdated) and videos can also be helpful. Participants are strongly encouraged to watch the videos before the start of the course.
If you experience any issues using the class technologies, please post them at the "Issues" of the course main github page, after exploring any related past issues there.

Book

The following book is recommended as optional background reading:
Data Science for Business: Fundamental Principles of Data Mining and Data-Analytic Thinking (DSB)
by F. Provost and T.Fawcett (2013)

Grading

Group project (1 Project): 60%

Individual Exercises (2 Exercise Sets): 20%

Class Participation: 20%


Class Group Project

This is largely a hands-on (group) project based course - largely in the format of a workshop. A central part of the course is a group project. For the class group project, every group (2-4 people per group) is required to develop a data analytics project for which data is also shared (if there are issues sharing the data this requirement may be waived). The project should start from a business problem and include these three parts:

Part 1: A clear process for how to solve the business problem with steps codified using R code and an interactive toolkit.

Part 2: An application of the process using a specific dataset.

Part 3: Specification for others to use the process, e.g. with different data.

A key lesson of the course is that an important success factor for (big) data analytics projects is to have a good balance between creative customization and codified, reproducible and reusable end-to-end analytics processes ("solutions"). An example of a codified end-to-end process ("solution") is the market segmentation process template we will develop during the course (see also course readings below). Other examples, based on an older format of the course, can be found in this older sample projects page. More examples of reproducible and reusable analytics solutions can also be found in the Azure ML cloud based analytics platform of Microsoft. Of course this space is growing and changing fast, with various platforms being developed such as the Google Cloud Machine Learning and Amazon Machine Learning ones, among others.


Course Sessions

Sessions 1-2: Data Analytics Processes and Tools

Optional Readings:

Skim through Chapters 1 and 2 of DSB. Focus on Sections 1.4, 1.7, 1.8, 2.1 and 2.4

What is Github?

What is R? R Reference Card.

Prepare before class:

You must setup a github account, follow the "Getting Started instructions" to install Rstudio and fork the course github repository before session 1.

Explore these applications developed using some of the tools we use in class.

Browse the "Course Tools" resources above. It is strongly recommended to watch the course tools introductory videos before class.

Individual Assignment: Be ready to present and share on your github account your solutions to Exercise Set 1. For this exercise set you can (exceptionally) work with your colleagues.

Make sure your github repository has the latest version of the course github.

In class material:

We will work on this document (which provides also some information on tools and business issues for market segmentation).

Some slides for session 1 (to run on your Rstudio)


Sessions 3-4: Dimensionality Reduction

Read:

Dimensionality Reductions and Derived Attributes (pdf version available here, and a shiny-based interactive version can be run using this code)

Skim through Chapter 6 and read Section 6.4 of DSB

Boats: Segmentation Case Boats A (Part I&II) (Note: official case is Insead case 09/2012-5849)

Technical slides (to run on your Rstudio) on today's class, as well as some more slides on the Boats case.

Prepare before class:

(Group) Setup your group project repository on your github with a description of the business problem you consider and a first draft of your business solution process.

(Individual) Be ready to present and share on your github account your solutions to one (or both) of the the following exercises:

Exercise Set 2 (see source code , too) on a $300 billion trading strategy that is a "classic". An interactive and update version will be discussed in class.

Fork the Google Analytics case study to your github and post your answers in a report in your github - you can use the data provided or data from any other website.

Make sure your github repository has the latest version of the course github.

In class material:

We will work on the first part (corresponding to Boats Case A, Part I) of this segmentation process document (which also provides some information on some visualization tools) using this source code.


Sessions 5-6: Clustering for Segmentation

Read:

Clustering and Segmentation (pdf version available here, and a shiny-based interactive version can be run using this code)

Classification (Extra Material) (pdf version available here, and a shiny-based interactive version can be run using this code)

Boats: Segmentation Case Boats B (Note: official case is Insead case 09/2012-5849)

Skim through Chapter 4 (focus more on Section 4.3), and Sections 7.1 and 8.2-8.5 of DSB (Extra Material)

Example of Scalable Reusability: Explore this example of how to use the tools to generate reusable (long) reports efficiently. How many lines do you need to edit in order to generage this long report? Here is the source code that generated this report.

Technical slides (to run on your Rstudio) of today's class, as well as some slides on the Boats case.

Prepare before class:

(Group) In-class brief presentation of your group project and a short tour of your project's github repository. Make sure at least one other group is also familiar before class with your project's github repository

(Group) Be ready to present and share on your github account your answers to the questions of the first part of the segmentation process document (corresponding to Boats Case A, Part I) using this source code. Note: there is no new exercise for this session, but by now you need to have completed and posted on your github your solutions to exersice sets 1, 2, and the first part of the in class segmentation process document.

(Individual) Explore some Shiny Dashboard visualitation tools

Make sure your github repository has the latest version of the course github.

In class material:

We will work on the second part (corresponding to Boats Case A, Part II) of this segmentation process document using this source code. This also provides some information on machine learning and artificial intelligence tools.

At the end of this session we will have completed the market segmentation template process. The same final code (e.g. based on our parameters choices of the segmentation process ) will also automatically generate a segmentation report (e.g. like this example segmentation report)


Sessions 7-8: Group Presentations and Wrap up

Skim through:

Information Management Issues

How to Tell If You Should Trust Your Statistical Models

Does bigger data lead to better decisions?

Run Field Experiments to Make Sense of Your Big Data

Prepare before class:

Group project is due before class: please post your group project on your github and prepare to showcase your project in class.

Group project mix: each group should be able to generate a version of the group report of another group. It is therefore important to also become familiar with another group's github repository and project.



Group Projects, January-February 2016 (INSEAD 2016J) (note: only info/data with non-confidential/NDA constraints):

Energy Consumption (github source here)

Twitter and Elections (github source here)

Travel Website Analytics (github source here)

Google Analytics Dashboard (tool screenshot) (github source here)

Sports Club Analytics (github source here)

AirbnBb (github source here)



Group Projects, May-June 2016 (INSEAD 2016D) (note: only info/data with non-confidential/NDA constraints):

Wine Analytics (github source here)

Firm Fundamentals Analysis (github source here)

Airline Customer Segmentation (github source here)

Speed Dating Intelligence (github source here)

Airbnb Pricing in Amsterdam (github source here)



Group Projects, Jan-Feb 2017 (INSEAD 2017J) (note: only info/data with non-confidential/NDA constraints):

HR Analytics Project 1 (github source here), Project 2 (github source here), Project 3 (github source here), Project 4 (github source here), Project 5 (github source here), Project 6 (github source here)

Wine Analytics (github source here)

Risk in NYC (github source here)

Healthcare (github source here)

Movie Sales (github source here)

Mobile Telco Segmentation (github source here)

Lending Club Defaults Project 1 (github source here) and Project 2 (github source here)

Speed Dating, Project 1, Parts 1 , 2 , 3 , 4 (github source here), and Project 2

Travel Website Analytics (github source here)

Mashable News (github source here)

Flight Delays (github source here)

Credit Card Default (github source here)

e-commerce Analytics (github source here)

Airline Fleet Segmentation (github source here)

Restaurant Ratings (github source here)



Group Projects, Jan-Feb 2018 (INSEAD 2018J) (note: only info/data with non-confidential/NDA constraints):

Airbnb Pricing

Predicting Automotive Equity Performance

Classification for Employee Attrition

Project Alcohol

Formula 1 Prediction

Rehabilitation App

IBM HR Analytics