ABC Hotels & Resorts is a hotel chain based in Spain, specializing in 4 and 5 star hotels, owned by Grupo ABC.

The client has its own website, which allows for online booking and receives traffic from several countries.

The Business Questions

Define martketing strategies to increase the revenues generated by the customers of the website though a better understanding of the customer segments.

The Process

We followed a 3 steps process:

Part 1: Data set inspection and dimensionality reduction

Part 2: Data analysis and marketing insights

Part 3: Final recommendation and strategy


Part 1 - Data Inspection

Key Customer Characteristics

First step: we load the data to use:

The data refers to the users and their interaction with the website until that time. Overall the dataset contains information on almost 1 million customers grouped in 667 types of customers described by 19 number of variables.

Variables description:
  • Type of User: Returning or New Visitor
  • Source: Organic vs Paid Promotions (cpc, cpm, referral)
  • Users: Number of User Visits
  • Sessions: Number of User Sessions
  • No of Pages Visited: Number of total pages visited
  • No of Transactions: Number of monetary transactions
  • Revenue: Revenue generated
  • Dummy Variable: We created dummy variables categorical data such as ‘user type’, ‘source’ and ‘device type’
factor_attributes_used = c(1:19)
factor_selectionciterion = 
minimum_variance_explained = 65
manual_numb_factors_used = 5
rotation_used = "varimax"
max_data_report = 5
factor_attributes_used <- intersect(factor_attributes_used, 1:ncol(myData))
ProjectDataFactor <- myData[,factor_attributes_used]
ProjectDataFactor <- as.matrix(ProjectDataFactor)

Check the Data

We start by doing a basic visual exploration of the first customer types:

Data Visualization

Customer01 Customer02 Customer03 Customer04 Customer05
Type.of.User Returning Visitor Returning Visitor New Visitor Returning Visitor Returning Visitor
Source google / organic google / organic (direct) / (none) google / organic google / cpc
Device desktop desktop desktop desktop desktop
Returning.Visitor 1 1 0 1 1
New.Visitor 0 0 1 0 0
Google.CPC 0 0 0 0 1
TripAdvisor 0 0 0 0 0
Direct 0 0 1 0 0
DFA 0 0 0 0 0
Email 0 0 0 0 0
Referral 0 0 0 0 0
Desktop 1 1 1 1 1
Mobile 0 0 0 0 0
Tablet 0 0 0 0 0
Users 7240 21442 2989 2637 3538
Sessions 10776 35015 2996 4112 5179
No.of.Pages.Visited 45328 173192 7989 16936 21534
No.of.Transactions 64 130 13 15 35
Revenue 456710.54 362368.88 341553.90 278226.12 216213.00

Check Correlations

We can then analyze the data in terms of correlation between different variables. The objetctive is to get a first idea of possible cross-relationships between variables that could lead to a dimensionality reduction as a first step for the analysis.

Correlation Table

Returning.Visitor New.Visitor Google.CPC TripAdvisor Direct DFA Email Referral Desktop Mobile Tablet Users Sessions No.of.Pages.Visited No.of.Transactions Revenue
Returning.Visitor 1.00 -1.00 -0.03 -0.01 -0.19 0.11 0.08 0.09 0.10 -0.06 -0.07 -0.25 -0.18 -0.16 0.05 0.03
New.Visitor -1.00 1.00 0.03 0.01 0.19 -0.11 -0.08 -0.09 -0.10 0.06 0.07 0.25 0.18 0.16 -0.05 -0.03
Google.CPC -0.03 0.03 1.00 -0.07 -0.13 -0.09 -0.07 -0.23 0.13 -0.09 -0.09 0.04 0.05 0.05 0.14 0.10
TripAdvisor -0.01 0.01 -0.07 1.00 -0.10 -0.07 -0.05 0.10 -0.09 0.01 0.10 -0.06 -0.07 -0.06 -0.06 -0.06
Direct -0.19 0.19 -0.13 -0.10 1.00 -0.13 -0.10 -0.31 -0.13 0.12 0.06 0.15 0.14 0.10 0.11 0.16
DFA 0.11 -0.11 -0.09 -0.07 -0.13 1.00 -0.07 -0.22 0.01 -0.02 0.01 -0.05 -0.03 -0.04 0.01 -0.01
Email 0.08 -0.08 -0.07 -0.05 -0.10 -0.07 1.00 -0.15 0.12 -0.07 -0.09 -0.07 -0.07 -0.07 -0.06 -0.06
Referral 0.09 -0.09 -0.23 0.10 -0.31 -0.22 -0.15 1.00 -0.05 0.04 0.03 -0.22 -0.24 -0.23 -0.19 -0.20
Desktop 0.10 -0.10 0.13 -0.09 -0.13 0.01 0.12 -0.05 1.00 -0.60 -0.74 -0.07 -0.10 -0.13 0.13 0.11
Mobile -0.06 0.06 -0.09 0.01 0.12 -0.02 -0.07 0.04 -0.60 1.00 -0.10 0.12 0.13 0.18 -0.07 -0.06
Tablet -0.07 0.07 -0.09 0.10 0.06 0.01 -0.09 0.03 -0.74 -0.10 1.00 -0.01 0.01 0.00 -0.10 -0.09
Users -0.25 0.25 0.04 -0.06 0.15 -0.05 -0.07 -0.22 -0.07 0.12 -0.01 1.00 0.98 0.92 0.46 0.39
Sessions -0.18 0.18 0.05 -0.07 0.14 -0.03 -0.07 -0.24 -0.10 0.13 0.01 0.98 1.00 0.95 0.53 0.45
No.of.Pages.Visited -0.16 0.16 0.05 -0.06 0.10 -0.04 -0.07 -0.23 -0.13 0.18 0.00 0.92 0.95 1.00 0.44 0.40
No.of.Transactions 0.05 -0.05 0.14 -0.06 0.11 0.01 -0.06 -0.19 0.13 -0.07 -0.10 0.46 0.53 0.44 1.00 0.74
Revenue 0.03 -0.03 0.10 -0.06 0.16 -0.01 -0.06 -0.20 0.11 -0.06 -0.09 0.39 0.45 0.40 0.74 1.00

From correlation matrix we can see that there is a relevant positive correlation between number of sessions, number of pages visited, number of transactions and revenues for each of the customer types.

This makes business sense, as it makes sense that the customer segment with more activity ends up generating higher revenues.

Another relevant insight is that Desktop device is the one that generates most of the traffic and revenues in absolute value.

Choose number of factors

To use Factor Analysis, we first need to adjust data to have only numeric data. We can then remove first columns and keep only binary values that describe the users types in the same manner.

num_data <- myData[,4:ncol(myData)]
scaled_data <- apply(num_data,2, function(r) {if (sd(r)!=0) res=(r-mean(r))/sd(r) else res=0*r; res})

Now that we have only numeric data, we can proceed with Factor Analysis in order to do the Dimensionality Redcution.

Variance_Explained_Table_results<-PCA(scaled_data, graph=FALSE)
Variance_Explained_Table<-Variance_Explained_Table_results$eig
Variance_Explained_Table<-as.data.frame(Variance_Explained_Table)
colnames(Variance_Explained_Table)<-c("Eigenvalue", "Percentage_of_explained_variance", "Cumulative_percentage_of_explained_variance")

eigenvalues  <- Variance_Explained_Table[,2]

The next step is to look at the variance explained as well as the eigenvalues

Variance Explained

Eigenvalue Percentage_of_explained_variance Cumulative_percentage_of_explained_variance
comp 1 3.84 24.01 24.01
comp 2 2.34 14.61 38.61
comp 3 1.81 11.29 49.91
comp 4 1.29 8.07 57.98
comp 5 1.17 7.33 65.31
comp 6 1.10 6.88 72.19
comp 7 1.05 6.55 78.74
comp 8 1.02 6.39 85.13
comp 9 0.92 5.74 90.87
comp 10 0.78 4.89 95.76
comp 11 0.34 2.15 97.91
comp 12 0.25 1.56 99.47
comp 13 0.07 0.45 99.93
comp 14 0.01 0.07 100.00
comp 15 0.00 0.00 100.00
comp 16 0.00 0.00 100.00

Eigenvalue Plot

Interpret the Components

We can see from the chart above that we could use only 8 components (the ones with eigenvalue >1). This number makes total sense as there are several binary variables that explain the same as the text variables. We can identify the following groups:

  • Device
  • Returning/New Visitor
  • Source
  • Revenues
  • Number of Sessions
  • Number of pages visited
  • Number of Users
  • Number of Transactions

In this case we suggest to keep all the original variables and not reduce the dimension since the higher number of binary variables are just a way to provide the same information in a numerical form.


Part 2: Marketing Strategy

To analyze different scenarios and identify possible strategies, we can focus on different behaviors across devices.

averageTransaction <- aggregate(myData$No.of.Transactions,list(myData$Device), FUN=sum)
colnames(averageTransaction) <- c("Device","No of Transactions")
knitr::kable(averageTransaction)
Device No of Transactions
desktop 3553
mobile 119
tablet 120

We can clearly see that desktop is the main channel used by customers to access the website, generating about 95% of total transactions. The next step is to analyze the revenue sources, to check if we find the same trend as in the number of transactions or if the behavior changes depending on the device used.

averageRevenue <- aggregate(myData$Revenue,list(myData$Device), FUN=sum)
colnames(averageRevenue) <- c("Device","Revenues")
knitr::kable(averageRevenue)
Device Revenues
desktop 8459779.1
mobile 291844.2
tablet 310228.0

The analysis on revenues shows that desktop is the device that generates more revenues, as expected from the previous result. Now, we will analyze revenues per transaction to check if the “unitary” revenues per customer vary depending on the type of device.

RevTransaction <- averageRevenue
RevTransaction[,2] <- averageRevenue[,2]/averageTransaction[,2] 
colnames(RevTransaction) <- c("Device","Revenue per Transaction")
knitr::kable(RevTransaction)
Device Revenue per Transaction
desktop 2381.024
mobile 2452.472
tablet 2585.233

We can see that revenues per transaction are quite stable cross-device, even though tablet is more profitable if compared to desktop and mobile devices.

Moving from devices to sources, we can then compare organic sources to payed sources (e.g. “cpc”) and analyze the relationship between transactions and revenues depending on the different sources.

averageTransactionSource <- aggregate(myData$No.of.Transactions,list(myData$Source), FUN=sum)
averageRevenueSource <- aggregate(myData$Revenue,list(myData$Source), FUN=sum)
RevTransactionSource <- averageRevenueSource
RevTransactionSource[,3] <- averageRevenueSource[,2]/averageTransactionSource[,2] 
RevTransactionSource[,2] <- averageRevenueSource[,2]
RevTransactionSource <- RevTransactionSource[order(RevTransactionSource[,2],RevTransactionSource[,1],decreasing=TRUE),]
rownames(RevTransactionSource) <- c(1:nrow(RevTransactionSource))
colnames(RevTransactionSource) <- c("Source","Revenue","Revenue/transaction")
knitr::kable(RevTransactionSource[1:10,])
Source Revenue Revenue/transaction
(direct) / (none) 2951215.28 2919.1051
google / organic 2793075.46 2632.4934
google / cpc 1466490.42 2104.0035
dfa / cpm 430017.88 1853.5253
dfa / cpc 226526.63 2335.3261
bing / organic 186316.63 2272.1540
email / email 109285.60 2023.8074
yahoo / organic 92137.69 2709.9321
olehotels.com / referral 69812.64 956.3375
adquiramexico.com.mx / referral 65988.00 16497.0000

In the table abover we are showing the top 10 sources of revenue and the average revenue per transaction.

The first relevant thing is the high average revenue per transaction. Analyzing the full data set we found more “reasonable” values for small sources, but all the top ones show high values. Our hypotheses is that some of the reservations might be done by agencies, which distorts the average figures.

The second insight is that among the top sources, the ones with direct access and organic searches on google or bing provide more than 4 times the revenues of payed sources like “cpc” and “cpm”.

Finally, we see that the average revenue per transaction of direct access/organic searches is higher than the average revenue per transaction of payed sources.


Part 3: Recommendation and next steps

From theprevious analysis, we suggest the following initial guidelines for the marketing strategy of ABC hotels and resorts:


In order to implement the strategic recommendations and to analyze additional options, the following next steps are needed: