Segmentation using speed dating dataset

The “Business Decision”

In this section we aim to segment Speed Dating users to better understand their characteristics.

The Data

First we load the data that we used in Insights section of this report.

Part 1: Key Customer Characteristics

Out of the data set, we have chosen several attributes that will take part in the segmentation analysis. Those qualities reflect following questions from the survey (refer to docummentation for more details):

Question Attribute
“date” In general, how frequently do you go on dates? [1-several times a week, 7-never]
“go_out” How often do you go out (not necessarily on dates)? [1-several times a week, 7-never]
“sports”, “tvsports”, etc. How interested are you in the following activities, on a scale of 1-10?
“exphappy” Overall, on a scale of 1-10, how happy do you expect to be with the people you meet during the speed-dating event?
“attr1_1”, “sinc1_1”, etc. We want to know what you look for in the opposite sex. Please rate the importance of the following attributes in a potential date on a scale of 0-100 (0=not at all important, 100=extremely important) (attractive, sincere, intelligent, fun, ambitions, shared interests/hobbies)
“attr2_1”, “sinc2_1”, etc. What do you think the opposite sex looks for in a date? Please rate the importance of the following attributes on a scale of 0-100 (0=not at all important, 100=extremely important) (attractive, sincere, intelligent, fun, ambitions, shared interests/hobbies)
“attr3_1”, “sinc3_1”, etc. How do you think you measure up? Please rate your opinion of your own attributes, on a scale of 0-10 (be honest!) (attractive, sincere, intelligent, fun, ambitions, shared interests/hobbies)

Based on the results of our iterative process, we have decided to optimize the factor-seeking algorithm for 8 factors (more details below)

Steps 1-2: Data check

We have computed the statistical summary of the distribution of the values across the selected attributes:

min 25 percent median mean 75 percent max std
date 1 4 5 5.01 6 7 1.43
go_out 1 1 2 2.20 3 7 1.17
sports 1 4 7 6.33 9 10 2.71
tvsports 1 2 4 4.51 7 10 2.83
exercise 1 5 7 6.22 8 10 2.49
dining 1 7 8 7.80 9 10 1.78
museums 1 6 7 6.97 9 10 2.05
art 1 5 7 6.74 8 10 2.27
hiking 0 3 6 5.75 8 10 2.65
gaming 0 1 3 3.72 5 14 2.54
clubbing 1 4 6 5.64 8 10 2.45
reading 1 7 8 7.68 9 13 1.94
tv 1 3 6 5.23 7 10 2.48
theater 1 5 7 6.80 9 10 2.21
movies 2 7 8 7.94 9 10 1.72
concerts 1 6 7 6.89 9 10 2.15
music 1 7 8 7.83 9 10 1.80
shopping 1 4 6 5.62 8 10 2.60
yoga 1 2 4 4.49 7 10 2.76
exphappy 1 5 6 5.49 7 10 1.72
attr1_1 0 15 20 23.55 30 100 12.67
sinc1_1 0 10 19 17.21 20 60 7.42
intel1_1 0 17 20 20.54 25 50 7.47
fun1_1 0 12 18 17.41 20 50 6.75
amb1_1 0 5 10 10.10 15 53 6.25
shar1_1 0 5 10 11.24 15 30 6.70
attr2_1 0 20 30 32.86 40 100 16.94
sinc2_1 0 7 10 12.47 18 50 7.34
intel2_1 0 10 15 14.43 20 40 6.72
fun2_1 0 15 20 18.22 20 50 7.04
amb2_1 0 5 10 11.07 15 50 7.31
shar2_1 0 5 10 11.05 15 30 6.53
attr3_1 2 6 7 6.98 8 10 1.41
sinc3_1 2 7 8 8.19 9 10 1.45
fun3_1 2 7 8 7.61 9 10 1.63
intel3_1 3 8 8 8.33 9 10 1.08
amb3_1 2 7 8 7.50 9 10 1.81

Step 3: Correlations Check

We have verified the correlaction between the selected attributes that will be later picked up by factors:

date go_out sports tvsports exercise dining museums art hiking gaming clubbing reading tv theater movies concerts music shopping yoga exphappy attr1_1 sinc1_1 intel1_1 fun1_1 amb1_1 shar1_1 attr2_1 sinc2_1 intel2_1 fun2_1 amb2_1 shar2_1 attr3_1 sinc3_1 fun3_1 intel3_1 amb3_1
date 1.00 0.34 -0.14 0.01 -0.07 -0.12 -0.05 -0.04 -0.02 -0.06 -0.12 0.05 0.06 -0.03 0.03 -0.03 -0.01 -0.13 -0.07 -0.08 -0.17 0.15 -0.03 0.00 0.12 0.08 0.10 -0.07 -0.20 0.06 -0.09 0.06 -0.21 0.12 -0.14 -0.17 -0.18
go_out 0.34 1.00 -0.08 0.02 -0.02 -0.17 0.02 0.03 0.01 0.01 -0.10 0.05 0.10 0.05 0.07 -0.03 0.00 0.00 -0.03 0.00 -0.05 0.22 -0.03 -0.13 -0.03 0.05 0.02 0.01 -0.07 0.02 0.00 0.01 -0.17 0.07 -0.26 -0.08 -0.15
sports -0.14 -0.08 1.00 0.47 0.40 -0.13 -0.12 -0.15 0.23 0.18 0.04 -0.10 -0.10 -0.26 -0.14 -0.02 0.00 -0.16 0.00 0.27 0.22 -0.04 -0.17 0.05 -0.18 -0.06 -0.13 0.09 0.15 -0.09 0.14 0.01 0.10 0.03 0.14 0.11 0.15
tvsports 0.01 0.02 0.47 1.00 0.23 -0.09 -0.08 -0.13 -0.04 0.22 0.05 -0.12 0.24 -0.14 -0.05 0.00 0.07 -0.01 -0.09 0.09 0.12 0.01 -0.17 0.09 -0.11 -0.07 -0.04 0.15 0.04 -0.16 0.11 -0.04 -0.01 0.05 0.15 0.04 0.04
exercise -0.07 -0.02 0.40 0.23 1.00 0.06 -0.01 -0.03 0.10 0.05 0.04 -0.02 0.04 -0.04 -0.09 -0.05 0.00 0.06 0.14 0.04 0.14 -0.11 -0.05 -0.03 0.01 -0.07 0.12 -0.13 -0.04 -0.01 -0.03 -0.05 0.22 0.01 0.09 0.03 0.14
dining -0.12 -0.17 -0.13 -0.09 0.06 1.00 0.43 0.37 0.14 0.02 0.22 0.12 0.12 0.37 0.27 0.23 0.20 0.44 0.21 0.08 -0.06 -0.07 0.07 -0.04 0.19 -0.03 0.11 -0.10 -0.05 0.04 -0.13 -0.01 0.21 0.10 0.27 0.15 0.23
museums -0.05 0.02 -0.12 -0.08 -0.01 0.43 1.00 0.87 0.21 -0.06 0.09 0.33 -0.01 0.51 0.28 0.40 0.28 0.23 0.29 0.08 -0.14 0.02 0.14 -0.05 0.09 0.05 0.09 -0.11 -0.05 -0.01 -0.08 0.06 0.10 0.14 0.14 0.05 0.07
art -0.04 0.03 -0.15 -0.13 -0.03 0.37 0.87 1.00 0.19 -0.06 0.09 0.29 -0.01 0.47 0.27 0.40 0.28 0.22 0.32 0.11 -0.10 -0.03 0.10 -0.01 0.10 0.02 0.10 -0.06 -0.06 -0.04 -0.10 0.05 0.13 0.10 0.19 0.02 0.09
hiking -0.02 0.01 0.23 -0.04 0.10 0.14 0.21 0.19 1.00 0.18 -0.02 0.13 -0.14 0.10 0.06 0.18 0.01 -0.07 0.25 0.18 -0.05 0.07 -0.09 0.09 0.03 0.01 -0.08 0.05 0.01 0.13 -0.04 0.05 0.07 0.11 0.00 0.03 0.06
gaming -0.06 0.01 0.18 0.22 0.05 0.02 -0.06 -0.06 0.18 1.00 0.17 -0.02 0.16 -0.02 0.03 0.10 0.03 0.07 -0.01 0.15 0.14 -0.06 -0.07 -0.01 -0.04 -0.06 -0.14 0.12 0.16 0.01 0.09 -0.06 -0.03 -0.04 0.00 0.06 0.13
clubbing -0.12 -0.10 0.04 0.05 0.04 0.22 0.09 0.09 -0.02 0.17 1.00 -0.11 0.03 0.13 0.10 0.10 0.09 0.24 0.05 0.11 0.09 -0.11 -0.06 0.04 0.10 -0.13 0.06 0.05 -0.01 -0.04 -0.05 -0.10 0.16 -0.04 0.25 0.00 0.14
reading 0.05 0.05 -0.10 -0.12 -0.02 0.12 0.33 0.29 0.13 -0.02 -0.11 1.00 -0.03 0.24 0.10 0.16 0.15 -0.02 0.10 -0.02 -0.15 0.07 0.20 -0.12 0.05 0.06 0.03 -0.12 -0.03 0.02 -0.04 0.10 0.03 0.21 0.05 0.14 0.06
tv 0.06 0.10 -0.10 0.24 0.04 0.12 -0.01 -0.01 -0.14 0.16 0.03 -0.03 1.00 0.19 0.34 0.06 0.07 0.43 0.02 -0.02 -0.10 0.06 -0.03 -0.01 0.11 0.06 0.04 0.01 -0.03 -0.02 -0.05 0.02 -0.02 0.09 0.06 -0.07 0.04
theater -0.03 0.05 -0.26 -0.14 -0.04 0.37 0.51 0.47 0.10 -0.02 0.13 0.24 0.19 1.00 0.47 0.42 0.27 0.32 0.26 0.04 -0.21 0.07 0.10 -0.08 0.24 0.05 0.10 -0.12 -0.08 0.06 -0.17 0.10 0.03 0.16 0.08 -0.05 0.12
movies 0.03 0.07 -0.14 -0.05 -0.09 0.27 0.28 0.27 0.06 0.03 0.10 0.10 0.34 0.47 1.00 0.39 0.33 0.23 0.12 0.01 -0.16 0.09 0.10 -0.05 0.11 0.01 0.05 -0.03 -0.04 0.01 -0.04 0.03 0.00 0.12 0.04 0.01 0.09
concerts -0.03 -0.03 -0.02 0.00 -0.05 0.23 0.40 0.40 0.18 0.10 0.10 0.16 0.06 0.42 0.39 1.00 0.65 0.24 0.28 0.15 -0.08 0.02 -0.01 -0.02 0.10 0.06 0.07 -0.07 0.00 -0.03 -0.10 0.06 0.06 0.09 0.17 -0.05 0.11
music -0.01 0.00 0.00 0.07 0.00 0.20 0.28 0.28 0.01 0.03 0.09 0.15 0.07 0.27 0.33 0.65 1.00 0.21 0.16 0.09 -0.05 0.00 -0.08 0.01 0.10 0.03 0.05 -0.08 0.01 -0.01 -0.04 0.04 0.15 0.09 0.26 0.01 0.18
shopping -0.13 0.00 -0.16 -0.01 0.06 0.44 0.23 0.22 -0.07 0.07 0.24 -0.02 0.43 0.32 0.23 0.24 0.21 1.00 0.22 0.09 -0.02 -0.09 -0.03 -0.01 0.27 -0.08 0.20 -0.12 -0.10 -0.05 -0.14 -0.05 0.17 -0.05 0.27 -0.06 0.20
yoga -0.07 -0.03 0.00 -0.09 0.14 0.21 0.29 0.32 0.25 -0.01 0.05 0.10 0.02 0.26 0.12 0.28 0.16 0.22 1.00 0.04 -0.11 -0.01 0.00 0.02 0.11 0.08 0.04 -0.06 -0.03 -0.01 -0.09 0.12 0.18 0.09 0.14 0.06 0.15
exphappy -0.08 0.00 0.27 0.09 0.04 0.08 0.08 0.11 0.18 0.15 0.11 -0.02 -0.02 0.04 0.01 0.15 0.09 0.09 0.04 1.00 0.02 0.01 -0.06 0.11 -0.04 -0.04 -0.20 0.21 0.18 0.03 0.09 -0.02 0.17 0.02 0.18 0.12 0.16
attr1_1 -0.17 -0.05 0.22 0.12 0.14 -0.06 -0.14 -0.10 -0.05 0.14 0.09 -0.15 -0.10 -0.21 -0.16 -0.08 -0.05 -0.02 -0.11 0.02 1.00 -0.44 -0.38 -0.17 -0.42 -0.42 0.27 -0.11 -0.08 -0.22 0.05 -0.30 0.19 -0.15 0.15 0.07 0.12
sinc1_1 0.15 0.22 -0.04 0.01 -0.11 -0.07 0.02 -0.03 0.07 -0.06 -0.11 0.07 0.06 0.07 0.09 0.02 0.00 -0.09 -0.01 0.01 -0.44 1.00 -0.11 -0.17 -0.02 0.05 -0.16 0.22 -0.02 0.09 -0.05 0.13 -0.25 0.34 -0.20 -0.19 -0.24
intel1_1 -0.03 -0.03 -0.17 -0.17 -0.05 0.07 0.14 0.10 -0.09 -0.07 -0.06 0.20 -0.03 0.10 0.10 -0.01 -0.08 -0.03 0.00 -0.06 -0.38 -0.11 1.00 -0.19 -0.03 -0.07 0.05 -0.16 0.17 0.01 -0.12 -0.04 -0.04 -0.04 -0.21 0.14 -0.08
fun1_1 0.00 -0.13 0.05 0.09 -0.03 -0.04 -0.05 -0.01 0.09 -0.01 0.04 -0.12 -0.01 -0.08 -0.05 -0.02 0.01 -0.01 0.02 0.11 -0.17 -0.17 -0.19 1.00 -0.06 -0.23 -0.16 0.05 0.08 0.24 0.05 -0.04 0.02 -0.06 0.27 -0.04 -0.06
amb1_1 0.12 -0.03 -0.18 -0.11 0.01 0.19 0.09 0.10 0.03 -0.04 0.10 0.05 0.11 0.24 0.11 0.10 0.10 0.27 0.11 -0.04 -0.42 -0.02 -0.03 -0.06 1.00 0.10 -0.06 -0.05 -0.05 0.14 0.01 0.10 0.05 0.02 0.08 -0.06 0.32
shar1_1 0.08 0.05 -0.06 -0.07 -0.07 -0.03 0.05 0.02 0.01 -0.06 -0.13 0.06 0.06 0.05 0.01 0.06 0.03 -0.08 0.08 -0.04 -0.42 0.05 -0.07 -0.23 0.10 1.00 -0.17 0.13 -0.05 -0.06 0.02 0.41 -0.12 0.01 -0.17 0.01 -0.09
attr2_1 0.10 0.02 -0.13 -0.04 0.12 0.11 0.09 0.10 -0.08 -0.14 0.06 0.03 0.04 0.10 0.05 0.07 0.05 0.20 0.04 -0.20 0.27 -0.16 0.05 -0.16 -0.06 -0.17 1.00 -0.62 -0.58 -0.27 -0.51 -0.43 -0.03 -0.01 0.07 -0.10 -0.04
sinc2_1 -0.07 0.01 0.09 0.15 -0.13 -0.10 -0.11 -0.06 0.05 0.12 0.05 -0.12 0.01 -0.12 -0.03 -0.07 -0.08 -0.12 -0.06 0.21 -0.11 0.22 -0.16 0.05 -0.05 0.13 -0.62 1.00 0.24 -0.05 0.16 0.12 -0.02 -0.04 -0.05 -0.03 -0.04
intel2_1 -0.20 -0.07 0.15 0.04 -0.04 -0.05 -0.05 -0.06 0.01 0.16 -0.01 -0.03 -0.03 -0.08 -0.04 0.00 0.01 -0.10 -0.03 0.18 -0.08 -0.02 0.17 0.08 -0.05 -0.05 -0.58 0.24 1.00 -0.06 0.23 0.01 0.04 0.01 0.03 0.14 0.09
fun2_1 0.06 0.02 -0.09 -0.16 -0.01 0.04 -0.01 -0.04 0.13 0.01 -0.04 0.02 -0.02 0.06 0.01 -0.03 -0.01 -0.05 -0.01 0.03 -0.22 0.09 0.01 0.24 0.14 -0.06 -0.27 -0.05 -0.06 1.00 -0.20 -0.03 -0.02 0.04 -0.07 -0.05 -0.08
amb2_1 -0.09 0.00 0.14 0.11 -0.03 -0.13 -0.08 -0.10 -0.04 0.09 -0.05 -0.04 -0.05 -0.17 -0.04 -0.10 -0.04 -0.14 -0.09 0.09 0.05 -0.05 -0.12 0.05 0.01 0.02 -0.51 0.16 0.23 -0.20 1.00 0.04 0.01 -0.08 0.04 0.14 0.14
shar2_1 0.06 0.01 0.01 -0.04 -0.05 -0.01 0.06 0.05 0.05 -0.06 -0.10 0.10 0.02 0.10 0.03 0.06 0.04 -0.05 0.12 -0.02 -0.30 0.13 -0.04 -0.04 0.10 0.41 -0.43 0.12 0.01 -0.03 0.04 1.00 0.05 0.09 -0.11 0.03 -0.01
attr3_1 -0.21 -0.17 0.10 -0.01 0.22 0.21 0.10 0.13 0.07 -0.03 0.16 0.03 -0.02 0.03 0.00 0.06 0.15 0.17 0.18 0.17 0.19 -0.25 -0.04 0.02 0.05 -0.12 -0.03 -0.02 0.04 -0.02 0.01 0.05 1.00 0.14 0.47 0.32 0.31
sinc3_1 0.12 0.07 0.03 0.05 0.01 0.10 0.14 0.10 0.11 -0.04 -0.04 0.21 0.09 0.16 0.12 0.09 0.09 -0.05 0.09 0.02 -0.15 0.34 -0.04 -0.06 0.02 0.01 -0.01 -0.04 0.01 0.04 -0.08 0.09 0.14 1.00 0.13 0.17 0.13
fun3_1 -0.14 -0.26 0.14 0.15 0.09 0.27 0.14 0.19 0.00 0.00 0.25 0.05 0.06 0.08 0.04 0.17 0.26 0.27 0.14 0.18 0.15 -0.20 -0.21 0.27 0.08 -0.17 0.07 -0.05 0.03 -0.07 0.04 -0.11 0.47 0.13 1.00 0.23 0.38
intel3_1 -0.17 -0.08 0.11 0.04 0.03 0.15 0.05 0.02 0.03 0.06 0.00 0.14 -0.07 -0.05 0.01 -0.05 0.01 -0.06 0.06 0.12 0.07 -0.19 0.14 -0.04 -0.06 0.01 -0.10 -0.03 0.14 -0.05 0.14 0.03 0.32 0.17 0.23 1.00 0.33
amb3_1 -0.18 -0.15 0.15 0.04 0.14 0.23 0.07 0.09 0.06 0.13 0.14 0.06 0.04 0.12 0.09 0.11 0.18 0.20 0.15 0.16 0.12 -0.24 -0.08 -0.06 0.32 -0.09 -0.04 -0.04 0.09 -0.08 0.14 -0.01 0.31 0.13 0.38 0.33 1.00

Step 4: Factor design

We have computed a list of factors, from which we have picked top 8 that represent over 50% of variance:

Eigenvalue Pct of explained variance Cumulative pct of explained variance
Component 1 4.52 12.23 12.23
Component 2 3.34 9.02 21.25
Component 3 2.81 7.60 28.85
Component 4 1.98 5.36 34.22
Component 5 1.84 4.99 39.20
Component 6 1.63 4.41 43.62
Component 7 1.59 4.30 47.92
Component 8 1.41 3.80 51.72
Component 9 1.33 3.59 55.31
Component 10 1.18 3.18 58.49
Component 11 1.16 3.13 61.62
Component 12 1.08 2.91 64.53
Component 13 0.99 2.68 67.22
Component 14 0.94 2.55 69.76
Component 15 0.93 2.51 72.27
Component 16 0.84 2.27 74.54
Component 17 0.81 2.19 76.74
Component 18 0.78 2.10 78.84
Component 19 0.69 1.86 80.70
Component 20 0.68 1.83 82.53
Component 21 0.66 1.79 84.32
Component 22 0.64 1.74 86.06
Component 23 0.59 1.60 87.66
Component 24 0.57 1.54 89.20
Component 25 0.55 1.50 90.70
Component 26 0.50 1.34 92.04
Component 27 0.45 1.22 93.26
Component 28 0.43 1.17 94.43
Component 29 0.40 1.09 95.52
Component 30 0.36 0.97 96.50
Component 31 0.34 0.92 97.41
Component 32 0.29 0.79 98.20
Component 33 0.27 0.74 98.93
Component 34 0.27 0.72 99.66
Component 35 0.11 0.29 99.94
Component 36 0.02 0.04 99.98
Component 37 0.01 0.02 100.00

We decided to look into the composition of the factors.

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
museums 0.81 0.05 -0.06 -0.14 -0.09 0.01 0.17 -0.01
art 0.81 0.09 -0.06 -0.14 -0.10 0.00 0.09 0.00
concerts 0.74 -0.02 0.05 0.09 0.12 0.02 -0.09 -0.05
theater 0.65 -0.02 -0.10 -0.20 0.30 0.07 0.12 0.05
music 0.56 0.06 0.03 0.15 0.20 0.03 -0.06 -0.07
movies 0.48 -0.13 0.06 -0.11 0.45 -0.06 0.10 -0.05
dining 0.46 0.36 -0.10 -0.11 0.27 -0.02 0.12 0.11
yoga 0.45 0.19 -0.13 0.12 -0.06 0.25 0.04 0.10
hiking 0.36 -0.04 0.06 0.30 -0.33 0.04 0.13 0.32
reading 0.32 -0.04 -0.06 -0.12 -0.14 0.05 0.52 -0.06
shopping 0.32 0.27 -0.16 -0.04 0.62 -0.03 -0.16 0.01
exphappy 0.22 0.12 0.38 0.27 -0.03 -0.13 0.02 0.17
fun3_1 0.19 0.60 -0.02 0.28 0.18 -0.10 0.02 0.17
clubbing 0.17 0.28 0.04 0.06 0.27 -0.17 -0.23 0.09
sinc3_1 0.11 -0.11 -0.06 0.24 0.12 0.09 0.62 0.15
attr2_1 0.10 0.01 -0.84 0.02 0.07 -0.35 -0.04 -0.24
attr3_1 0.09 0.64 -0.03 0.20 0.01 0.04 0.20 0.06
shar2_1 0.08 -0.04 0.16 0.02 -0.06 0.73 0.05 -0.04
amb3_1 0.08 0.59 0.08 0.19 0.26 0.08 0.24 -0.06
gaming 0.06 -0.04 0.35 0.27 0.20 -0.26 -0.03 -0.05
sinc1_1 0.06 -0.59 0.10 0.09 0.10 0.19 0.22 0.20
amb1_1 0.06 0.20 -0.11 -0.16 0.42 0.43 0.05 0.26
shar1_1 0.06 -0.15 0.05 -0.04 -0.03 0.73 -0.04 -0.22
go_out 0.03 -0.53 -0.06 0.10 0.11 -0.02 0.17 -0.08
tv 0.03 -0.14 0.02 0.10 0.76 -0.01 0.03 -0.07
intel1_1 0.02 -0.01 0.05 -0.56 -0.03 -0.19 0.44 -0.12
fun1_1 -0.03 0.13 0.13 0.07 -0.05 -0.15 -0.25 0.64
exercise -0.05 0.20 -0.21 0.50 -0.02 0.00 0.16 -0.01
sinc2_1 -0.05 -0.14 0.64 0.10 0.00 0.18 -0.19 0.09
intel2_1 -0.05 0.11 0.71 -0.11 -0.03 -0.12 0.16 0.01
fun2_1 -0.05 -0.08 -0.03 -0.12 -0.01 -0.02 0.09 0.73
intel3_1 -0.06 0.45 0.18 0.04 -0.06 -0.03 0.53 -0.15
date -0.08 -0.46 -0.26 0.07 0.10 0.13 0.12 0.13
sports -0.08 0.10 0.22 0.71 -0.21 -0.08 0.06 -0.05
tvsports -0.10 -0.09 0.18 0.63 0.22 -0.13 0.00 -0.14
attr1_1 -0.10 0.27 -0.14 0.34 -0.20 -0.50 -0.26 -0.38
amb2_1 -0.15 0.13 0.54 0.06 -0.04 0.11 -0.03 -0.21

For better clarity, we have left only significant values that would help us to interpret the factors

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
museums 0.81
art 0.81
concerts 0.74
theater 0.65
music 0.56
movies
dining
yoga
hiking
reading 0.52
shopping 0.62
exphappy
fun3_1 0.60
clubbing
sinc3_1 0.62
attr2_1 -0.84
attr3_1 0.64
shar2_1 0.73
amb3_1 0.59
gaming
sinc1_1 -0.59
amb1_1
shar1_1 0.73
go_out -0.53
tv 0.76
intel1_1 -0.56
fun1_1 0.64
exercise 0.50
sinc2_1 0.64
intel2_1 0.71
fun2_1 0.73
intel3_1 0.53
date
sports 0.71
tvsports 0.63
attr1_1 -0.50
amb2_1 0.54

Step 5: Factor score interpretation

Based on our internal discussions, we have come up with following interpretation to the factors:

  • Factor 1: High culture intensive hobby
  • Factor 2: Highly-ambitious self-flattering attitude
  • Factor 3: Belief in high-moral standards
  • Factor 4: Body-over-mind focus
  • Factor 5: Procrastinating hobby
  • Factor 6: Belief in soul-mating
  • Factor 7: Intelligence-focus
  • Factor 8: Fun-seeking approach

Part 2: Customer Segmentation

Based on our deiftion of the factors, we have selected the key attributes that will be used for customer segmentation

In order to understand how different would the potential Speed Date attendants be, we have used the Hclust method to compute a dendrogram. Based on it, have we decided to choose 4 as the most optimal number of segments.

Below is visualization of distance between comonents proving that four segments make sense

Profiling and segmenting

In order to be able to profile the segments, we have looked into information shared by average Speed Dating survey responders.

Population Seg.1 Seg.2 Seg.3 Seg.4
gender 0.48 0.32 0.61 0.70 0.27
idg 16.34 15.10 16.67 18.39 15.00
condtn 1.80 1.83 1.78 1.79 1.80
wave 10.92 11.91 9.79 10.50 11.78
round 16.43 16.38 16.42 16.45 16.47
position 8.97 8.93 9.10 8.64 9.20
positin1
order 8.70 8.68 8.69 8.71 8.73
partner 8.73 8.70 8.72 8.74 8.76
pid
match 0.17 0.16 0.19 0.14 0.19
int_corr
samerace 0.39 0.42 0.40 0.35 0.40
age_o
race_o
pf_o_att
pf_o_sin
pf_o_int
pf_o_fun
pf_o_amb
pf_o_sha
dec_o 0.43 0.43 0.46 0.33 0.49
attr_o
sinc_o
intel_o
fun_o
amb_o
shar_o
like_o
prob_o
met_o
age 26.14 26.02 26.36
field 92.58 95.66 92.29 91.74 90.51
field_cd 7.84 7.73 6.94
undergra 65.88 67.50 59.85 65.51 72.31
mn_sat 17.92 18.68 18.86 14.11 20.05
tuition 29.88 34.47 27.42 23.88 34.64
race 2.77 2.80 2.57 3.07 2.65
imprelig 3.65 3.64 2.97 4.13 4.02
from 98.95 104.46 97.07 99.54 94.73
zipcode 124.68 128.37 116.30 115.02 142.11
income 46.49 57.34 47.09 31.13 50.71
goal 2.19 2.03 2.44 2.10 2.14
date 5.03 4.77 5.15
go_out 2.20 2.29 2.22 2.13 2.16
career 136.62 141.73 132.62 135.17 137.83
career_c 4.97
sports 6.30 6.45 6.39 6.50 5.82
tvsports 4.50 4.43 4.36 4.82 4.41
exercise 6.19 6.19 6.14 6.03 6.43
dining 7.79 7.59 7.73 7.74 8.17
museums 6.98 6.74 6.70 6.97 7.61
art 6.74 6.59 6.59 6.48 7.38
hiking 5.76 5.74 5.77 5.78 5.74
gaming 3.71 3.54 3.87 4.25 3.10
clubbing 5.65 5.42 5.77 5.58 5.81
reading 7.68 7.48 7.50 7.81 7.97
tv 5.24 5.46 4.93 5.52 5.11
theater 6.81 6.71 6.68 6.56 7.35
movies 7.95 7.94 7.85 7.95 8.08
concerts 6.89 6.89 6.66 6.82 7.25
music 7.83 7.62 7.69 7.94 8.12
shopping 5.62 5.52 5.33 5.31 6.44
yoga 4.48 4.72 4.35 4.19 4.71
exphappy 5.34 5.91 5.60
expnum
attr1_1 23.50 19.32 25.62 20.15 28.97
sinc1_1 17.20 17.84 17.51 17.95 15.29
intel1_1 20.57 19.94 21.16 20.11 21.00
fun1_1 17.46 16.63 19.63 16.83 16.25
amb1_1 10.06 10.08 9.84 10.68 9.66
shar1_1 11.27 16.28 5.90 14.58 9.13
attr4_1
sinc4_1
intel4_1
fun4_1
amb4_1
shar4_1
attr2_1 32.90 32.58 26.98 16.60 58.78
sinc2_1 12.43 11.70 13.19 18.41 5.69
intel2_1 14.41 13.14 17.01 17.94 8.58
fun2_1 18.25 19.29 20.05 18.30 14.72
amb2_1 11.06 10.26 12.75 14.66 5.82
shar2_1 11.04 13.47 9.94 14.11 6.45
attr3_1 6.98 6.88 7.25 6.88 6.87
sinc3_1 8.19 8.31 8.17 8.12 8.17
fun3_1 7.61 7.35 7.85 7.44 7.79
intel3_1 8.32 8.30 8.37 8.44 8.16
amb3_1 7.50 7.07 7.74 7.65 7.48
attr5_1
sinc5_1
intel5_1
fun5_1
amb5_1
dec 0.43 0.41 0.43 0.49 0.38
attr
sinc

We have plot “snake plots” for the profiling process to be easier to visualize (e.g. by identifying qualities with largest differences between the segments).

As well as looking at the deviation of the data from the average numbers for the total population. We have hid the deviations of low significance.

Seg.1 Seg.2 Seg.3 Seg.4
gender -0.34 0.25 0.44 -0.44
idg
condtn
wave
round
position
positin1
order
partner
pid
match
int_corr
samerace
age_o
race_o
pf_o_att
pf_o_sin
pf_o_int
pf_o_fun
pf_o_amb
pf_o_sha
dec_o -0.23
attr_o
sinc_o
intel_o
fun_o
amb_o
shar_o
like_o
prob_o
met_o
age
field
field_cd
undergra
mn_sat -0.21
tuition -0.20
race
imprelig
from
zipcode
income 0.23 -0.33
goal
date
go_out
career
career_c
sports
tvsports
exercise
dining
museums
art
hiking
gaming
clubbing
reading
tv
theater
movies
concerts
music
shopping
yoga
exphappy
expnum
attr1_1 0.23
sinc1_1
intel1_1
fun1_1
amb1_1
shar1_1 0.44 -0.48 0.29
attr4_1
sinc4_1
intel4_1
fun4_1
amb4_1
shar4_1
attr2_1 -0.50 0.79
sinc2_1 0.48 -0.54
intel2_1 0.24 -0.40
fun2_1
amb2_1 0.33 -0.47
shar2_1 0.22 0.28 -0.42
attr3_1
sinc3_1
fun3_1
intel3_1
amb3_1
attr5_1
sinc5_1
intel5_1
fun5_1
amb5_1
dec
attr
sinc

As a result, we have realized that the key differences in Speed Dating attendants across all segments are:

  • Gender
  • Income (although the income was based on postcodes)
  • Importance of religion (i.e. the potential partner is of same religion)
Segment 1 Segment 2 Segment 3 Segment 4
gender -0.3 0.3 0.4 -0.4
idg -0.1 0.0 0.1 -0.1
condtn 0.0 0.0 0.0 0.0
wave 0.1 -0.1 0.0 0.1
round 0.0 0.0 0.0 0.0
position 0.0 0.0 0.0 0.0
positin1
order 0.0 0.0 0.0 0.0
partner 0.0 0.0 0.0 0.0
pid
match 0.0 0.1 -0.2 0.1
int_corr
samerace 0.1 0.0 -0.1 0.0
age_o
race_o
pf_o_att
pf_o_sin
pf_o_int
pf_o_fun
pf_o_amb
pf_o_sha
dec_o 0.0 0.1 -0.2 0.1
attr_o
sinc_o
intel_o
fun_o
amb_o
shar_o
like_o
prob_o
met_o
age
field 0.0 0.0 0.0 0.0
field_cd
undergra 0.0 -0.1 0.0 0.1
mn_sat 0.0 0.1 -0.2 0.1
tuition 0.2 -0.1 -0.2 0.2
race 0.0 -0.1 0.1 0.0
imprelig 0.0 -0.2 0.1 0.1
from 0.1 0.0 0.0 0.0
zipcode 0.0 -0.1 -0.1 0.1
income 0.2 0.0 -0.3 0.1
goal -0.1 0.1 0.0 0.0
date
go_out 0.0 0.0 0.0 0.0
career 0.0 0.0 0.0 0.0
career_c
sports 0.0 0.0 0.0 -0.1
tvsports 0.0 0.0 0.1 0.0
exercise 0.0 0.0 0.0 0.0
dining 0.0 0.0 0.0 0.0
museums 0.0 0.0 0.0 0.1
art 0.0 0.0 0.0 0.1
hiking 0.0 0.0 0.0 0.0
gaming 0.0 0.0 0.1 -0.2
clubbing 0.0 0.0 0.0 0.0
reading 0.0 0.0 0.0 0.0
tv 0.0 -0.1 0.1 0.0
theater 0.0 0.0 0.0 0.1
movies 0.0 0.0 0.0 0.0
concerts 0.0 0.0 0.0 0.1
music 0.0 0.0 0.0 0.0
shopping 0.0 -0.1 -0.1 0.1
yoga 0.1 0.0 -0.1 0.1
exphappy
expnum
attr1_1 -0.2 0.1 -0.1 0.2
sinc1_1 0.0 0.0 0.0 -0.1
intel1_1 0.0 0.0 0.0 0.0
fun1_1 0.0 0.1 0.0 -0.1
amb1_1 0.0 0.0 0.1 0.0
shar1_1 0.4 -0.5 0.3 -0.2
attr4_1
sinc4_1
intel4_1
fun4_1
amb4_1
shar4_1
attr2_1 0.0 -0.2 -0.5 0.8
sinc2_1 -0.1 0.1 0.5 -0.5
intel2_1 -0.1 0.2 0.2 -0.4
fun2_1 0.1 0.1 0.0 -0.2
amb2_1 -0.1 0.2 0.3 -0.5
shar2_1 0.2 -0.1 0.3 -0.4
attr3_1 0.0 0.0 0.0 0.0
sinc3_1 0.0 0.0 0.0 0.0
fun3_1 0.0 0.0 0.0 0.0
intel3_1 0.0 0.0 0.0 0.0
amb3_1 -0.1 0.0 0.0 0.0
attr5_1
sinc5_1
intel5_1
fun5_1
amb5_1
dec 0.0 0.0 0.1 -0.1
attr
sinc

We have saved our segmentation output in order to use it in the next part of the analysis, i.e. classification.