D Insead_Analytics
View on GitHub

INSEAD Course:
Data Analytics with Programming Workshop

T. Evgeniou

Professor of Decision Sciences and Technology Management,
INSEAD

Download this project as a .zip file Download this project as a tar.gz file

Course Description

This is a (workshop) version of the Data Analytics for Business course. The main goal of this course is to introduce participants to:

This is not a course to become "data scientists" or even to become "experts in analytics". The goal is to familiarize participants with what is available (and possible) for analytics. It is meant to be a starting point.

The course is built around a specific business case that we approach step by step while getting introduced to the topics above. The course has a heavy "hands-on" flavor: we will analyse data using open source data analytics software tools - which you will be able to use in your jobs. The tools are based on the R language. As we will be using these tools during class, participants are required to come to class with a laptop having all tools installed (see information below).

Course Tools

The course is using two main tools: Rstudio and Github. These are the Getting Started Instructions to follow before the beginning of the course. More resources are available at the website of the Data Analytics for Business course. All participants are required a week before the first session to install Rstudio and also setup a github account where they will be posting their work.
Important note: If you experience any issues using the class technologies, please post them at the "Issues" of the course main github page, after exploring any related past issues there.

A key lesson of the course is that an important success factor for data analytics projects is to have a good balance between creative customization and codified, reproducible and reusable end-to-end analytics processes ("solutions"). An example of a codified end-to-end process ("solution") is the market segmentation process template we will develop during the course (see also course readings below). More examples of reproducible and reusable analytics solutions can also be found in the Azure ML cloud based analytics platform of Microsoft. Of course this space is growing and changing fast, with various platforms being developed such as the Google Cloud Machine Learning and Amazon Machine Learning ones, among others.

Grading

1 Group Exercise: 40%

1 Individual Exercise: 40%

Class Participation: 20%

Book

The following book is recommended as optional background reading:
Data Science for Business: Fundamental Principles of Data Mining and Data-Analytic Thinking (DSB)
by F. Provost and T.Fawcett (2013)



Course Sessions

Day 1 (Sessions 1-2): Introduction to Analytics Tools and Coding

In these sessions we will work on an example of setting up a new analytics project and also become familiar with basic coding.

Optional Readings:

Skim through Chapters 1 and 2 of DSB. Focus on Sections 1.4, 1.7, 1.8, 2.1 and 2.4

Skim through the CRSP-DM documentation

What is Github?

What is R? R Reference Card.

Prepare before class:

You must setup a github account and follow the "Getting Started instructions" to install Rstudio and fork the course github repository before session 1.

Explore these applications developed using some of the tools we use in class.

In class material:

We will work on the session 1 in-class document (we will be editing the corresponding raw file)

Some slides for session 1


Day 2 (Sessions 3-6): Developing an analytics solution

In these sessions we will work step by step on a complete case study (on market segmentation) and in the process we will learn about two important types of analytics and machine learning tools: dimensionality reduction and clustering. Participants will work in groups to develop a solution for the case study. At the end of this module we will have completed a market segmentation template process and generated a segmentation report.

Read before class:

Dimensionality Reductions and Derived Attributes (pdf version available here)

Clustering and Segmentation (pdf version available here)

Skim through Chapter 6, read Section 6.4, skim through chapter 4 (focus more on Section 4.3), and Sections 7.1 and 8.2-8.5 of DSB

Boats: Segmentation Case Boats A (Part I&II) (Note: official case is Insead case 09/2012-5849) and Boats: Segmentation Case Boats B (Note: official case is Insead case 09/2012-5849)

Technical (to run on your Rstudio) slides on dimensionality reduction , slides on clustering, as well as some more slides on the Boats case A and slides on the Boats case B.

Prepare before class:

Finish and push on your github your work on the session 1 in-class document.

Make sure you can "knit" the market segmentation process document and generate an html report (check out this issue if needed).

Explore some Shiny Dashboard visualization tools

In class material:

We will work on the segmentation template document (we will be editing the corresponding raw file)

These interactive document tools will also be used during class (running on RStudio): the Interactive Factors Analysis Tool as well as the Interactive Cluster Analysis Tool


Day 3 (Sessions 7-8): Classification Methods and Process Wrap up

In these sessions we will discuss a third class of analytics tools, namely classification tools. These tools are used for example for marketing campaigns, credit scoring, preventive maintenance, etc. We will then wrap up the case study and discuss key principles and lessons learned.

Read:

Classification (pdf version available here, and a shiny-based interactive version can be run using this code)

Example segmentation solution.

Example of Scalable Reusability: Explore this example of how to use the tools to generate reusable (long) reports efficiently. How many lines do you need to edit in order to generate this long report? Here is the source code that generated this report.

Prepare before class:

(Group Exercise) Prepare and share on your github account a first draft of your answers to the questions of Parts 1 and 2 of the segmentation process case study. We will finalise it in class.

(Individual Exercise) Complete this Exercise and push it on your individual github