Objective

When salary discussions are approaching, most companies face the same challenge; how to distribute the limited cash pie among employees. Different companies developed different methods and criteria towards this challenge, however, the goal is common for most of them: reward good employees who are likely to stay with the company for the long term. The first part of this equation is quite simple: identifying the good employees. Most companies maintain evaluation process that allows them to identify high preforming employees. The latter part, however, is trickier. How can companies predict who are the employees that are likely to leave? What are the characteristics of those high-risk employees?

The goal of our analysis is to use big data analytics (~15,000 employee records) to identify groups of employees with high likelihood of leaving the company. Once these groups were identified we wish to understand the underlying drivers behind their attrition. With this kind of data in hand, companies can better allocate their resources and invest in employees in risk. As not fair as it may sound, investing resources in good employees with low likelihood to leave is a low ROI investment.

Analysis Process

  1. Data check and Visualization First, we will analyze and visualize the data to get a basic understanding of the data inhand (Human Resources Analytics by Ludovic Benistant from kaggle.com). After obtaining a basic understanding of the data, we will check the correlation between the give attributes and interpret the data.

  2. Cluster analysis and Segmentation Second, we will segment the entire employees by using the cluster method to segement and profile the given employee pool,and observe if any certain segment of employees have a higher attrition rate than others.

  3. Key drivers analysis We will also try to analyze the key factors that are more influential in driving employees to leave their company using the classification model (tree induction).

  4. Finally, we will recommend several business decisions based on our data analysis from above to help the company target and invest in their human resources effectively and reduce the risk and negative impact of losing high performing employees.


1. Data check and Visualisation

1.1 Load and Explore the data

First, let’s load the data to use.

ProjectData <- read.csv("./data/HR_data.csv")
ProjectData = data.matrix(ProjectData)

Description of the data

  1. Employee satisfaction level
  2. Last evaluation
  3. Number of projects
  4. Average monthly hours
  5. Time spent at the company
  6. Whether they have had a work accident
  7. Whether they have had a promotion in the last 5 years
  8. Department
  9. Salary (1=low, 2=medium, 3=high)
  10. Whether employee has left

This is how the first 10 set of data (employees) look like.

Obs.01 Obs.02 Obs.03 Obs.04 Obs.05 Obs.06 Obs.07 Obs.08 Obs.09 Obs.10
satisfaction_level 0.38 0.80 0.11 0.72 0.37 0.41 0.10 0.92 0.89 0.42
last_evaluation 0.53 0.86 0.88 0.87 0.52 0.50 0.77 0.85 1.00 0.53
number_project 2.00 5.00 7.00 5.00 2.00 2.00 6.00 5.00 5.00 2.00
average_montly_hours 157.00 262.00 272.00 223.00 159.00 153.00 247.00 259.00 224.00 142.00
time_spend_company 3.00 6.00 4.00 5.00 3.00 3.00 4.00 5.00 5.00 3.00
Work_accident 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
left 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
promotion_last_5years 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
salary_level 1.00 2.00 2.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
sales 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
accounting 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
hr 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
technical 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
support 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
management 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
IT 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
product_mng 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
marketing 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RandD 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

The data we use here have the following descriptive statistics.

min 25 percent median mean 75 percent max std
satisfaction_level 0.09 0.44 0.64 0.61 0.82 1 0.25
last_evaluation 0.36 0.56 0.72 0.72 0.87 1 0.17
number_project 2.00 3.00 4.00 3.80 5.00 7 1.23
average_montly_hours 96.00 156.00 200.00 201.05 245.00 310 49.94
time_spend_company 2.00 3.00 3.00 3.50 4.00 10 1.46
Work_accident 0.00 0.00 0.00 0.14 0.00 1 0.35
left 0.00 0.00 0.00 0.24 0.00 1 0.43
promotion_last_5years 0.00 0.00 0.00 0.02 0.00 1 0.14
salary_level 1.00 1.00 2.00 1.59 2.00 3 0.64
sales 0.00 0.00 0.00 0.28 1.00 1 0.45
accounting 0.00 0.00 0.00 0.05 0.00 1 0.22
hr 0.00 0.00 0.00 0.05 0.00 1 0.22
technical 0.00 0.00 0.00 0.18 0.00 1 0.39
support 0.00 0.00 0.00 0.15 0.00 1 0.36
management 0.00 0.00 0.00 0.04 0.00 1 0.20
IT 0.00 0.00 0.00 0.08 0.00 1 0.27
product_mng 0.00 0.00 0.00 0.06 0.00 1 0.24
marketing 0.00 0.00 0.00 0.06 0.00 1 0.23
RandD 0.00 0.00 0.00 0.05 0.00 1 0.22

1.2 Scale the data

Here, we are standardizing the data in order to avoid having the problem of the result being driven by a few relatively large values. We will scale the data between 0 and 1.

ProjectDataFactor_scaled = apply(ProjectDataFactor, 2, function(r) {
    res = (r - min(r))/(max(r) - min(r))
    res
})

Below is the summary statistics of the scaled dataset.

min 25 percent median mean 75 percent max std
satisfaction_level 0 0.38 0.60 0.57 0.80 1 0.27
last_evaluation 0 0.31 0.56 0.56 0.80 1 0.27
number_project 0 0.20 0.40 0.36 0.60 1 0.25
average_montly_hours 0 0.28 0.49 0.49 0.70 1 0.23
time_spend_company 0 0.12 0.12 0.19 0.25 1 0.18
Work_accident 0 0.00 0.00 0.14 0.00 1 0.35
left 0 0.00 0.00 0.24 0.00 1 0.43
promotion_last_5years 0 0.00 0.00 0.02 0.00 1 0.14
salary_level 0 0.00 0.50 0.30 0.50 1 0.32
sales 0 0.00 0.00 0.28 1.00 1 0.45
accounting 0 0.00 0.00 0.05 0.00 1 0.22
hr 0 0.00 0.00 0.05 0.00 1 0.22
technical 0 0.00 0.00 0.18 0.00 1 0.39
support 0 0.00 0.00 0.15 0.00 1 0.36
management 0 0.00 0.00 0.04 0.00 1 0.20
IT 0 0.00 0.00 0.08 0.00 1 0.27
product_mng 0 0.00 0.00 0.06 0.00 1 0.24
marketing 0 0.00 0.00 0.06 0.00 1 0.23
RandD 0 0.00 0.00 0.05 0.00 1 0.22

1.3 Check Correlations

The simplest way to have a first look at a dataset is to check the correlation. By doing this, we can easily see which factors have a high positive/negative correlation with leaving employees. This is different from a causality, therefore we cannot conclude that a highly correlated factor (independent variables) leads an employee to leave (dependent variable). Also, if some of the factors (independent variables) are highly correlated with each other, we could consider to group these attributes together.

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years salary_level sales accounting hr technical support management IT product_mng marketing RandD
satisfaction_level 1.00 0.11 -0.14 -0.02 -0.10 0.06 -0.39 0.03 0.05 0.00 -0.03 -0.01 -0.01 0.01 0.01 0.01 0.01 0.01 0.01
last_evaluation 0.11 1.00 0.35 0.34 0.13 -0.01 0.01 -0.01 -0.01 -0.02 0.00 -0.01 0.01 0.02 0.01 0.00 0.00 0.00 -0.01
number_project -0.14 0.35 1.00 0.42 0.20 0.00 0.02 -0.01 0.00 -0.01 0.00 -0.03 0.03 0.00 0.01 0.00 0.00 -0.02 0.01
average_montly_hours -0.02 0.34 0.42 1.00 0.13 -0.01 0.07 0.00 0.00 0.00 0.00 -0.01 0.01 0.00 0.00 0.01 -0.01 -0.01 0.00
time_spend_company -0.10 0.13 0.20 0.13 1.00 0.00 0.14 0.07 0.05 0.02 0.00 -0.02 -0.03 -0.03 0.12 -0.01 0.00 0.01 -0.02
Work_accident 0.06 -0.01 0.00 -0.01 0.00 1.00 -0.15 0.04 0.01 0.00 -0.01 -0.02 -0.01 0.01 0.01 -0.01 0.00 0.01 0.02
left -0.39 0.01 0.02 0.07 0.14 -0.15 1.00 -0.06 -0.16 0.01 0.02 0.03 0.02 0.01 -0.05 -0.01 -0.01 0.00 -0.05
promotion_last_5years 0.03 -0.01 -0.01 0.00 0.07 0.04 -0.06 1.00 0.10 0.01 0.00 0.00 -0.04 -0.04 0.13 -0.04 -0.04 0.05 0.02
salary_level 0.05 -0.01 0.00 0.00 0.05 0.01 -0.16 0.10 1.00 -0.04 0.01 0.00 -0.02 -0.03 0.16 -0.01 -0.01 0.01 0.00
sales 0.00 -0.02 -0.01 0.00 0.02 0.00 0.01 0.01 -0.04 1.00 -0.14 -0.14 -0.29 -0.26 -0.13 -0.18 -0.16 -0.15 -0.15
accounting -0.03 0.00 0.00 0.00 0.00 -0.01 0.02 0.00 0.01 -0.14 1.00 -0.05 -0.11 -0.10 -0.05 -0.07 -0.06 -0.06 -0.05
hr -0.01 -0.01 -0.03 -0.01 -0.02 -0.02 0.03 0.00 0.00 -0.14 -0.05 1.00 -0.11 -0.10 -0.05 -0.07 -0.06 -0.06 -0.05
technical -0.01 0.01 0.03 0.01 -0.03 -0.01 0.02 -0.04 -0.02 -0.29 -0.11 -0.11 1.00 -0.20 -0.10 -0.14 -0.12 -0.12 -0.11
support 0.01 0.02 0.00 0.00 -0.03 0.01 0.01 -0.04 -0.03 -0.26 -0.10 -0.10 -0.20 1.00 -0.09 -0.12 -0.11 -0.10 -0.10
management 0.01 0.01 0.01 0.00 0.12 0.01 -0.05 0.13 0.16 -0.13 -0.05 -0.05 -0.10 -0.09 1.00 -0.06 -0.05 -0.05 -0.05
IT 0.01 0.00 0.00 0.01 -0.01 -0.01 -0.01 -0.04 -0.01 -0.18 -0.07 -0.07 -0.14 -0.12 -0.06 1.00 -0.08 -0.07 -0.07
product_mng 0.01 0.00 0.00 -0.01 0.00 0.00 -0.01 -0.04 -0.01 -0.16 -0.06 -0.06 -0.12 -0.11 -0.05 -0.08 1.00 -0.06 -0.06
marketing 0.01 0.00 -0.02 -0.01 0.01 0.01 0.00 0.05 0.01 -0.15 -0.06 -0.06 -0.12 -0.10 -0.05 -0.07 -0.06 1.00 -0.06
RandD 0.01 -0.01 0.01 0.00 -0.02 0.02 -0.05 0.02 0.00 -0.15 -0.05 -0.05 -0.11 -0.10 -0.05 -0.07 -0.06 -0.06 1.00

The most significant variable to look at is ‘Satisfaction level’, which is strongly negatively correlated with employees leaving. What influences the satisfaction level is not clearly indicated in the data description, but we can at least look at the correlation between Satisfaction level and the other variables to see what other variables could be related to Satisfaction level. The Satisfaction level is also negatively correlated with time spent at the company, and number of projects. This can be interpreted as ‘the longer the employee has stayed at the company, the lower the level of satisfaction’, which indicates that the company may be lacking in providing long term goals or visions. Being invloved in a lot of projects is also quite highly correlated to employees leaving. However, since long working hours do not have a significant correlation with attrition, we can also infer that being invloved in too many tasks, i.e. being disorganized and distracted, causes lower satisfactory level than simply having longer working hours.


2. Cluster Analysis and Segmentation

Test #1

2.1: Select segmentation variables and methods

We will segement the employees including all the variables except the variable “Whether employee has left.” We will use Euclidean distance.

segmentation_attributes_used = c(1:6, 8:19)
profile_attributes_used = c(1:19)
numb_clusters_used = 5
profile_with = "hclust"
distance_used = "euclidean"
hclust_method = "ward.D"

Here are the differences between the observations using the distance metric we selected (euclidean):

Obs.01 Obs.02 Obs.03 Obs.04 Obs.05 Obs.06 Obs.07 Obs.08 Obs.09 Obs.10
Obs.01 0.00
Obs.02 1.21 0.00
Obs.03 1.39 0.89 0.00
Obs.04 0.97 0.55 0.96 0.00
Obs.05 0.02 1.22 1.39 0.98 0.00
Obs.06 0.06 1.23 1.43 0.99 0.06 0.00
Obs.07 1.03 0.98 0.58 0.75 1.03 1.07 0.00
Obs.08 1.12 0.53 1.11 0.28 1.13 1.13 0.94 0.00
Obs.09 1.17 0.60 1.12 0.28 1.18 1.19 0.97 0.29 0.00
Obs.10 0.08 1.23 1.43 0.98 0.10 0.07 1.08 1.13 1.17 0

2.2 Visualize Pair-wise Distances

Below is the histogram of, say, the first 2 variables.

or the histogram of all pairwise distances for the euclidean distance:

The mountain and valley in our histogram shows us that there is a high possibility of multiple segments within the employees. We will try to identify these segments in the next part of our analysis.

2.3 Number of Segments

Let’s use the Hierarchical Clustering methods. It may be useful to see the dendrogram from, to have a quick idea of how the data may be segmented and how many segments there may be. Here is the dendrogram for our data:

We can also plot the “distances” traveled before we merge any of the lower and smaller in size clusters into larger ones - the heights of the tree branches that link the clusters as we traverse the tree from its leaves to its root. If we have n observations, the plot will have n-1 numbers. We can see the first 20 here.

For now, let’s consider the 4-segments solution. We can also see the segment each observation (respondent in this case) belongs to for the first 20 people:

Observation Number Cluster_Membership
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 2
20 1

2.4 Profile and interpret the segments

Having decided how many clusters to use, we would like to get a better understanding of who the customers in those clusters are and interpret the segments.

Let’s see first how many observations we have in each segment, for the segments we selected above:

Segment 1 Segment 2 Segment 3 Segment 4
Number of Obs. 4040 6058 2692 2209

The average values of our data for the total population as well as within each customer segment are:

Population Segment 1 Segment 2 Segment 3 Segment 4
satisfaction_level 0.57 0.57 0.58 0.57 0.58
last_evaluation 0.56 0.55 0.56 0.56 0.57
number_project 0.36 0.36 0.36 0.38 0.36
average_montly_hours 0.49 0.49 0.49 0.50 0.49
time_spend_company 0.19 0.19 0.19 0.18 0.17
Work_accident 0.14 0.14 0.15 0.14 0.15
left 0.24 0.25 0.22 0.26 0.25
promotion_last_5years 0.02 0.00 0.05 0.00 0.00
salary_level 0.30 0.27 0.33 0.28 0.27
sales 0.28 1.00 0.02 0.00 0.00
accounting 0.05 0.00 0.13 0.00 0.00
hr 0.05 0.00 0.12 0.00 0.00
technical 0.18 0.00 0.00 1.00 0.00
support 0.15 0.00 0.00 0.00 1.00
management 0.04 0.00 0.10 0.00 0.00
IT 0.08 0.00 0.20 0.00 0.00
product_mng 0.06 0.00 0.15 0.00 0.00
marketing 0.06 0.00 0.14 0.00 0.00
RandD 0.05 0.00 0.13 0.00 0.00

Looking at the data, we realized our segments have been divided highly depending on the department employees belong to (The employees from same departments were all grouped in the same segment). Therefore, we will re-segment our data excluding departments, and only use the ‘departments’ for profiling.


Test #2

2.1 Select segmentation variables and methods

This is our 2nd try on segmentation - using the same method as above, but now with the variable ‘departments’ removed as well as “Whether employee has left”. We will use Euclidean distance.

segmentation_attributes_used = c(1:6, 8:9)
profile_attributes_used = c(1:19)
numb_clusters_used = 5
profile_with = "hclust"
distance_used = "euclidean"
hclust_method = "ward.D"

2.2 Visualize Pair-wise Distances

We will skip this subsection for our 2nd try.

2.3 Number of Segments

Let’s plot the “distances” between clusters before we merge any of the lower and smaller sized clusters into larger ones.

For now, we will choose 5 segments. We can see the segment each observation (respondent in this case) belongs to for the first 20 people:

Observation Number Cluster_Membership
1 1
2 2
3 3
4 4
5 1
6 1
7 3
8 4
9 4
10 1
11 1
12 3
13 4
14 1
15 1
16 1
17 1
18 4
19 3
20 4

2.4 Profile and interpret the segments

Having decided how many clusters to use, we would like to have a better understanding of who the customers in those clusters are and interpret the segments.

Let’s see first how many observations we have in each segment, for the segments we selected above:

Segment 1 Segment 2 Segment 3 Segment 4 Segment 5
Number of Obs. 1816 4172 2728 4190 2093

The average values of our data for the total population as well as within each customer segment are:

Population Segment 1 Segment 2 Segment 3 Segment 4 Segment 5
satisfaction_level 0.57 0.37 0.71 0.28 0.70 0.61
last_evaluation 0.56 0.26 0.60 0.61 0.61 0.55
number_project 0.36 0.04 0.37 0.57 0.36 0.36
average_montly_hours 0.49 0.24 0.51 0.60 0.51 0.48
time_spend_company 0.19 0.12 0.14 0.36 0.16 0.19
Work_accident 0.14 0.00 0.00 0.03 0.00 1.00
left 0.24 0.77 0.10 0.35 0.15 0.08
promotion_last_5years 0.02 0.00 0.00 0.12 0.00 0.00
salary_level 0.30 0.25 0.60 0.33 0.00 0.30
sales 0.28 0.29 0.25 0.29 0.29 0.27
accounting 0.05 0.06 0.05 0.06 0.05 0.05
hr 0.05 0.07 0.05 0.05 0.05 0.04
technical 0.18 0.17 0.18 0.18 0.19 0.18
support 0.15 0.16 0.16 0.12 0.15 0.16
management 0.04 0.03 0.05 0.07 0.02 0.04
IT 0.08 0.07 0.09 0.07 0.09 0.08
product_mng 0.06 0.06 0.06 0.05 0.07 0.06
marketing 0.06 0.06 0.06 0.06 0.05 0.06
RandD 0.05 0.04 0.06 0.05 0.05 0.06

Segment 1 to 4 has quite clear differentiation among segments, and we were able to profile them as ‘Quitters’, ‘Pampered Loyals’, ‘The Burned-outs’, and ‘Neglected Loyals’. However, everyone in Segment 5 has had a work accident (coefficient 1), which is a variable we consider not to be much meaningful in exploring who and how to retain. Therefore, we will redo the analysis again excluding work accident as a variable.


Test #3

2.1 Select segmentation variables and methods

We will now use the variables except ‘Whether employee has left’,‘Department’, and ‘Work accident’. We will use Euclidean distance, just like before.

segmentation_attributes_used = c(1:5, 8:9)
profile_attributes_used = c(1:19)
numb_clusters_used = 4
profile_with = "hclust"
distance_used = "euclidean"
hclust_method = "ward.D"

2.2 Visualize Pair-wise Distances

We will skip this subsection for our 3rd try.

2.3 Number of Segments

Let’s plot the “distances” between clusters before we merge any of the lower and smaller sized clusters into larger ones.

The appropriate number of segments is 4, with the distance between clusters dropping drastically after 4. Below are the segments assigned to each employee, for the first 20 employees (observations).

Observation Number Cluster_Membership
1 1
2 2
3 3
4 4
5 1
6 1
7 3
8 4
9 4
10 1
11 1
12 3
13 4
14 1
15 1
16 1
17 1
18 4
19 2
20 4

2.4 Profile and interpret the segments

The number and segmentation of the clustuers seem reasonable. Now, in order to get a better understanding of who the employyes in those clusters are, we will attempt to profile and interpret the segments using the all the variables (attributes) we originally had, including the variables we excluded for segmentation.

Let’s first see how many observations we have in each segment, for the segments we selected above:

Segment 1 Segment 2 Segment 3 Segment 4
Number of Obs. 1930 6017 2135 4917

The average values of our data for the total population as well as within each customer segment are as below:

Population Segment 1 Segment 2 Segment 3 Segment 4
satisfaction_level 0.57 0.37 0.70 0.12 0.70
last_evaluation 0.56 0.26 0.58 0.67 0.59
number_project 0.36 0.04 0.36 0.65 0.36
average_montly_hours 0.49 0.24 0.50 0.65 0.51
time_spend_company 0.19 0.12 0.20 0.29 0.15
Work_accident 0.14 0.08 0.17 0.12 0.16
left 0.24 0.76 0.08 0.47 0.13
promotion_last_5years 0.02 0.00 0.05 0.00 0.00
salary_level 0.30 0.26 0.57 0.26 0.00
sales 0.28 0.30 0.27 0.27 0.28
accounting 0.05 0.06 0.05 0.06 0.05
hr 0.05 0.07 0.04 0.05 0.05
technical 0.18 0.17 0.17 0.19 0.19
support 0.15 0.15 0.15 0.14 0.15
management 0.04 0.02 0.06 0.04 0.02
IT 0.08 0.07 0.08 0.08 0.08
product_mng 0.06 0.06 0.06 0.05 0.07
marketing 0.06 0.06 0.06 0.05 0.05
RandD 0.05 0.04 0.05 0.05 0.06

After analyzing the results, we were able to define each segment:

  • Segment 1 – “revolving doors” – low preforming with high likelihood to leave – these employees are low performers with low satisfaction levels. Although their average salary is just below company average they show very low commitment with low utilization and working hours way below average.

  • Segment 2 – “pampered loyalists” – high preforming with low likelihood to leave – the main characteristic of these employees is high salaries; almost double than the company average. They show very high satisfaction levels and average performance across all main parameters.

  • Segment 3 – “Burned” – High preforming with high likelihood to leave – these employees show very low levels of satisfaction, probably due to over utilization (above average number of hours and number of projects). The high commitment does not reflect in salaries which are below average.

  • Segment 4 – “Happy Cash Cows” – high preforming low likelihood to leave – these employees show tremendously high levels of satisfaction although their salaries are extremely low. They present decent performance across all main parameters and are very unlikely to leave. the interesting segment emerging from the analysis above is segment 3, good employees in high risk. The data revels four drivers for attrition: Low level of salary, high utilization and low levels of satisfaction.


3. Classification Analysis

We will also use the classification analysis methods to understand the key drivers for leaving. Hence our dependent variable is ‘Whether employee has left.’

dependent_variable = 7
independent_variables = c(1:5, 8:9)

Probability_Threshold = 0.5

estimation_data_percent = 80
validation_data_percent = 10

random_sampling = 0

# Tree (CART) complexity control cp
CART_cp = 0.02

# the minimum size of a segment for the analysis to be done
min_segment = 100

This is a “small tree” classification for example:

From the tree analysis we can understand ‘satisfacton level’ is the most important driver. This is consistent with our segmentation, in which less satisfied segments are more likely to leave.


4. Business Decisions

To synthesize our analysis, we have three recommendations to companies.

  1. We recommend companies to allocate their resources to improving satisfaction among employees in Segment 3, the “Burned” employees since this segment represents those who are high performing but are also most likely to leave the company. To retain employees in this segment, companies should: increase salaries decrease the number of work hours, and; *reduce the number of projects per employee, redirecting it to other employees. Additional research should be done to study qualitative alternatives to improve this segment’s work satisfaction.

  2. We recommend companies to gather additional employee information in order to further analyze Segment 4, such as employee function and employee seniority. This segment has significantly high employee satisfaction and performance yet have low likelihood of leaving, and further analysis should be focused on determining why this is. Our hypothesis is that Segment 4 may include a large number of junior employees or interns, who are content with learning and doing a large amount of work despite small pay. If this is so, then reproducing work conditions of Segment 4 to Segment 3 to retain “Burned” employees is not possible. Further analysis can confirm or reject this hypothesis.

  3. We recommend companies to further analyze Segment 1 to understand why employees are low performing and why they leave. Several possible drivers for low performance could be inadequate training, bad management, or inaccurate guidance for HR hiring procedures (ie. Hiring the wrong people). This understanding can help companies to reduce costs associated with high employee turnover.