Here is how you can create a very long document using only very few lines of code.

Let’s consider the boats case study as an example. Let’s only consider the cluster analysis part of the case solution. Here is how to get a report with all possible number of hierarchical cluster analysis methods tested (and endlessly more if needed). Of course one needs to then see all this output to decide what method is the best (statistically, interpretation, business-wise). Adding also all k-means methods (or other methods) will just make the report longer, but your write up time will not change much!

Let’s first run all necessary code from the analysis.

Check this first (manually), as always.

getwd()
setwd("CourseSessions/Sessions45")
list.files()
rm(list = ls())  # Clean up the memory, if we want to rerun from scratch

The focus in this example is on the hierarchical clustering segmentation part only (this is an example anyway), so we assume we selected the factors already (of course one can also change the number of factors used, all automatically).

ProjectData <- read.csv("data/Boats.csv", sep = ";", dec = ",")  # this contains only the matrix ProjectData
ProjectData = data.matrix(ProjectData)
colnames(ProjectData) <- gsub("\\.", " ", colnames(ProjectData))
ProjectDataFactor = ProjectData[, c(2:30)]

segmentation_attributes_used = c(10, 19, 5, 12, 3)
profile_attributes_used = 2:ncol(ProjectData)
ProjectData_segment = ProjectData[, segmentation_attributes_used]
ProjectData_profile = ProjectData[, profile_attributes_used]

And now we just need to call this new function repetitioncode_example defined in file repetitioncode_example.R using all possible variations of inputs (e.g. about the distance_used or the hclust_method in this example).

library(pryr) # make sure you installed this one
source("repetitioncode_example.R") # see what this does. all the trick is there and in the use of "results='asis'" for this code chunk
numb_clusters_used = 3  # let's not generate 100 pages! 

#for (distance_used in c("euclidean","maximum")) # see help(dist), add any of these (but the more you add the longer the report will be) c("euclidean", "maximum", "manhattan", "canberra", "binary","minkowski")
#  for (hclust_method in c("ward.D","ward.D2")) { # see help(hclust), add any of these (but the more you add the longer the report will be)  c("ward.D","ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid")

for (distance_used in c("euclidean", "maximum", "manhattan", "canberra", "binary","minkowski")) # see help(dist), add any of these (but the more you add the longer the report will be) c("euclidean", "maximum", "manhattan", "canberra", "binary","minkowski")
  for (hclust_method in c("ward.D","ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid")) { # see help(hclust), add any of these (but the more you add the longer the report will be)  c("ward.D","ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid")
    
    tmp = repetitioncode_example(distance_used,hclust_method,numb_clusters_used,ProjectData_segment)
    cat("<br><hr><br>")
    cat(tmp$text1)
    cat("<br>")
    tmp$plot1
    cat("<br>")
    cat(tmp$text2)
    cat("<br>")
    print(tmp$Line,'chart')
    cat("<br>")
    cat(tmp$text3)
    cat("<br> <br>")
  }



We now use as distance_used the method euclidean and as hclust_method used the method ward.D

Finally, we can see the dendrogram (see class readings and online resources for more information) to have a first rough idea of what segments (clusters) we may have - and how many.


We can also plot the ‘distances’ traveled before we need to merge any of the lower and smaller in size clusters into larger ones - the heights of the tree branches that link the clusters as we traverse the tree from its leaves to its root. If we have n observations, this plot has n-1 numbers.




We now use as distance_used the method euclidean and as hclust_method used the method ward.D2

Finally, we can see the dendrogram (see class readings and online resources for more information) to have a first rough idea of what segments (clusters) we may have - and how many.


We can also plot the ‘distances’ traveled before we need to merge any of the lower and smaller in size clusters into larger ones - the heights of the tree branches that link the clusters as we traverse the tree from its leaves to its root. If we have n observations, this plot has n-1 numbers.