Airbnb Booking Analysis

BDA - INSEAD MBA 16J

Business motivation

Understanding customers’ dynamics and behaviour is essential for digital companies, especially for those whose business model is based on purchases or reservations from customers.

In the case of a company such as Airbnb, where customers rent rooms or apartments from other users, it is key to understand what drives customers to make the final reservation. If the company manages to understand the patterns, and has a systematic process to analyze the behaviour, it will be able to implement actions to improve the booking ratio, as well as to assess the success of the actions.

For this case, we have selected a public shared file with data from customers, and we will guide you through the different data that the file contains, and the different measures that we find interesting in order to analyze customers’ behaviour.

The raw data file is sourced from airbnb website and contains data from May 2014 to May 2015. It is publically available data put up by airbnb as a part of an analytics competition.

It is very important to understand the columns, let’s review its content:

  • id_visitor: the id of the visitor
  • id_session: the id of the session
  • dim_session_number: the number of session on a given day for a visitor
  • dim_user_agent: the user agent of the session
  • dim_device_app_combo: the parsed out device/app combo from user agent
  • ds: date stamp of session
  • ts_min: time of session start
  • ts_max: time of session end
  • did_search: binary flag indicating if the visitor performed a search during the session
  • sent_message: binary flag indicating if the visitor sent a message during the session
  • sent_booking_request: binary flag indicating if the visitor sent a booking request during the session


Let us look at the data in available form. We have taken first hundred rows of data to display the data:

Conversion rates

Let’s have a look at the conversion rates for the users. We are going to analyze in this case several ratios, such as: - the percentage of visits that end in bookings - the percentage of times that users send a message to the owner - the perc. of times that a booking is realized after a message has been sent

Before any other analysis, let’s look at these ratios (Each block of code corresponds to one ratio) for the global set of customers (without considering the number of session or other parameters):

Summary


As we can see, for every 100 visits to Airbnb, only less than 2 end up in a booking. Although we can’t conclude whether this figure is high or low, it will be interesting to keep a record of how this percentage change along time, and whether it changes with specific actions Airbnb takes in marketing.
On the other hand, if we check the success ratio after sending a message, we can see it is much higher than without a message (for every 100 messages sent, over 8 bookings were realized). It would be useful to have more data about what it is that drives the customers to send messages.
If Airbnb could gather the information about why customers send messages, it could develop more targeted campaigns, and could track whether the ratio of messages vs visits increases, and/or whether the ratio of bookings vs messages increases.


Seasonility, visitor and device analysis

In the following we will visually analyze the data by date, unique visitor and device. This helps Airbnb to get a better intuition about who their customers are and how they behave. In the following we will visualize data along the date line, unique visitors and devices/app from which they accessed Airbnb.



Visualization of data by date


Here we grouped all data by date with the goal to see whether we can identify any seasonal trends with regards to customer activity.
First let’s look at the data sorted by date, between 2014-05-05 and 2015-04-23. Below are the first 10 entries:



Looking at the data we see spikes in activity around September, as well as the end of the year and early February. It is interesting to see that while there seems to be some seasonality effect with the number of visits this does not fully translate in a similar pattern for the number of searches.



A similar pattern also holds for messages and bookings which are relatively evenly spread throughout the year.



However, the average session duration was significantly higher around September than other times of the year. Further research needs to be done in order to completely understand this spike.






Visualization of data by unique visitor


Let’s have a look at the first 10 entries of the data by unique visitor:



Here we see that only a very small part of all unique visitors account for the majority share of all visits. Yet the number of searches is driven by a broader unique visitor base.



Below we see the Searches, Messages and Bookings by Unique Visitors.



Below we show the average time unique visitors spent per session.

Visualization of data by by device


Lastly, let’s have a look at the data by device and app from which the customers accessed Airbnb.



Below you can find the graphical output of our data analysis by device and app. We see that most people access Airbnb from the Iphones apps. Access via desktop also relatively wide spread, however, Android users are less active, or form a smaller part of Airbnb users.



Below we see the Searches, Messages and Bookings activity by Device and App.








Time analysis


When we look at the trend in visits by month, we see that September and December have the highest traffic, which is likely being driven by the summer and winter vacation periods. By day of week, we find that traffic appears to be somewhat higher on weekdays in comparison to weekends. Given that the majority of visits are being done through mobile devices this may be a result of users spending time on Airbnb during their commutes or during breaks in the work day. In terms of hourly traffic, there is a significant drop in traffic between 3pm-7pm. We do not have enough data to determine the causes, but this suggests that online marketing efforts such as paid-search should be reduced during this time period to optimize cost-benefit.




$ps [1] 12


When we observe the trend in session duration, we find that the majority of each session is between one and ten minutes long, and the majority of users spend less than ten minutes in total per day on the website. We would need to conduct more advanced analytics to determine whether bookings are correlated with longer time spent per day, but given the usage habits for the Airbnb platform reflects 10 minute time frames, from a website/app development perspective, Airbnb should focus on providing an efficient and fasts search experience when provides the information the user needs before she/he moves onto their next task and lose interest.




Below we see the relationship between time of day and booking. It is interesting to see that the percentage in May appears to be unusally high. Perhaps this is being driven by easier search processes, which may be caused by a greater abundance of available Airbnb rooms. On the contrary this may be caused by a difference in price sensitivity of users during this month. Further data and analysis to assess the underlying drivers of this difference would be interesting as a next step, and may lead to some insights for how to improve the efficiency and ease of the user experience.



Analysis of booking Device


## Number of visits per booking

airbnbData = ProjectData
airbnb_visits = nrow(airbnbData)
class(airbnbData)
## [1] "data.frame"
Visitsperbooking = nrow(airbnbData)/sum(airbnbData$sent_booking_request)


For Looking at Desktop bookings with Chrome

airbnb_visits_Chrome = airbnbData[airbnbData$dim_device_app_combo == "Desktop - Chrome", 
    ]
airbnb_bookings_Chrome = sum(airbnb_visits_Chrome$sent_booking_request)
Visit_per_booking_Chrome = nrow(airbnb_visits_Chrome)/airbnb_bookings_Chrome


Number of messages per booking

messagesperbooking = sum(airbnbdata$sent_message)/sum(airbnbData$sent_booking_request)


For Looking at Desktop bookings with Chrome
Number of message per booking

airbnb_visits_Chrome = airbnbData[airbnbData$dim_device_app_combo == "Desktop - Chrome", 
    ]
airbnb_bookings_Chrome = sum(airbnb_visits_Chrome$sent_booking_request)
message_per_booking_Chrome = sum(airbnb_visits_Chrome$sent_message)/airbnb_bookings_Chrome


Number of searches per booking

searchesperbooking = sum(airbnbData$did_search)/sum(airbnbData$sent_booking_request)


For Looking at Desktop bookings with Chrome
Number of searches per booking

airbnb_visits_Chrome = airbnbData[airbnbData$dim_device_app_combo == "Desktop - Chrome", 
    ]
airbnb_bookings_Chrome = sum(airbnb_visits_Chrome$sent_booking_request)
search_per_booking_Chrome = sum(airbnb_visits_Chrome$did_search)/airbnb_bookings_Chrome


Summary

The number of visits/booking is 53.4896552 and for desktop users this changes to 21.0892857 .

For comparison of number of messages sent. On the whole number of messages per booking are 8.8206897 and for desktop 4.8571429.

Similar comparison for the number of searches gives: Total number of searches per booking 8.5241379 and for desktop 5.6964286.

Device comparison

To visualise across all the different devices:

Let us create a new table (dataframe) with all the numbers grouped for the devices using the aggregate function

Device_data = aggregate.data.frame(airbnbData[, 9:11], by = list(airbnbData$dim_device_app_combo), 
    FUN = sum)
Device_number = aggregate.data.frame(airbnbData$entrance, by = list(airbnbData$dim_device_app_combo), 
    FUN = sum)
Device_data$Total = Device_number$x


Now we define per booking metrics for the devices:

Device_data$messageperbooking = Device_data$sent_message/Device_data$sent_booking_request
Device_data$searchperbooking = Device_data$did_search/Device_data$sent_booking_request
Device_data$visitsperbooking = Device_data$Total/Device_data$sent_booking_request


We need to remove NA values

Device_data[is.na(Device_data)] = 0



For the removal of infinite values let us create another table excluding the valid values

Device_data_no_inf = Device_data[Device_data$sent_booking_request != 0, ]

Let us plot the metrics, starting with visits per booking

library(ggplot2)
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following objects are masked from 'package:data.table':
## 
##     dcast, melt
ggplot(Device_data_no_inf, aes(x = Group.1)) + geom_bar(aes(y = visitsperbooking), 
    fill = "blue", stat = "identity") + theme_minimal() + ylab("Visits per booking") + 
    xlab("Devices & App") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + 
    ggtitle("Visits per Booking by Devices & App") + scale_y_discrete(breaks = pretty_breaks(n = 5)) + 
    labs(fill = "", colour = "") + theme(axis.text.x = element_text(angle = 45, 
    vjust = 0.5))


Next we analyse search per booking

library(ggplot2)
library(reshape2)

ggplot(Device_data_no_inf, aes(x = Group.1)) + geom_bar(aes(y = searchperbooking), 
    fill = "blue", stat = "identity") + theme_minimal() + ylab("Searches per booking") + 
    xlab("Devices & App") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + 
    ggtitle("Searches per Booking by Devices & App") + scale_y_discrete(breaks = pretty_breaks(n = 5)) + 
    labs(fill = "", colour = "") + theme(axis.text.x = element_text(angle = 45, 
    vjust = 0.5))


Followed by messages per booking

library(ggplot2)
library(reshape2)

ggplot(Device_data_no_inf, aes(x = Group.1)) + geom_bar(aes(y = messageperbooking), 
    fill = "blue", stat = "identity") + theme_minimal() + ylab("Message per booking") + 
    xlab("Devices & App") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + 
    ggtitle("Messages per Booking by Devices & App") + scale_y_discrete(breaks = pretty_breaks(n = 5)) + 
    labs(fill = "", colour = "") + theme(axis.text.x = element_text(angle = 45, 
    vjust = 0.5))


So we have seen how customers behave across time, seasons, and devices. This is integral to better pricing.

And to live happily ever after.