1 Introduction

Goals

The goal of this project is to determine if various economic and industry indecators can be used to predict whether the automotive stock index S5AUCO will outperform the S&P500 2 months after the data is released. This analysis, in itself is not a trading strategy. However, a trading strategy could be developed using the statistical methods explored here.

If these indecators can indeed predict outperformance, a short/long strategy could be developed to earn returns. For example, if the January’s data can predict the outperformance of automotive stocks in March, an invester would take a long position in the S5AUCO portfolio and a short position in the S&P500 on March 1st and divest on March 31st, profiting on the spread. The opposite trade would be made for an ‘underperform’ prediction. This strategy is considered market neutral as it is indifferent to whether the market is up or down, the trader profits on the spread as long as the prediction is correct.

Alternatively, traders could act on predicted “outperform” signals by overweighting the auto sector. This strategy is a relative short. While traders don’t actually take a short position, they place greater weight on the Auto sector within their portfolios, thus taking the view that this sector will outperform the S&P. This is known as a relative short since it attempts to outperform the S&P by assigning a lower weightage to it within their portfolios. While this strategy allows traders/portfolio managers to sidestep the legal prohibition to a short position, this strategy is not market neutral and can lose money. The hope of this strategy is that it loses less money than a long position in the S&Ps.

Methodology

*Lag: A 2-month lag is used (i.e. acting on January’s data in March, not February) to prevent a timing mismatch. For example, January’s economic data may be released some time in February, so the trader will not have access to all the data on February 1st.

*Backtesting: The data is split into training and validation data.This allows us to assess the model’s performance on historical data.

*Feature engineering: All non-percentage independent variables are converted into annualized percent gains. Also, four-month moving averages are calculated because initial exploration of the data indicates stronger correlations with the dependent variable in some cases.

*PCA Analysis: To reduce the effect of ‘data mining’, PCA Analysis is used to combine and use only the variables that will be a strong predictor of outperformance. For the PCA analysis, the dependent variable is the spread between the automotive stock index and S&P500.

*Dependent Variable: The goal of this model is to predict outperformance. Thus, after the PCA components are generated, the dependent variabele used for logistic regression is a binary signal, indicating whether the automotive index has outperformed the S&P500.

*Logistic Regression: Logistic regression is then applied to the training data using the PCA-generated components.

*Validation: The logistic regression model is then applied to the validation dataset to assess the model’s prediction power.

Overview of input data

1 DATE 2 IPindex - Industrial Production Index 3 LEITOTL - Leading Economic Indicators Index 4 CPTICHNG - Capacity Utilization Index 5 SAARTOTL - Monthly Auto Sales Index 6 CQOQ - GDP Index 7 MTIB - US Manufacturing & Trade Inventories Index 8 CONSSENT - Consumer Confidence Index 9 PCEMBIX - US Market Based Personal Consumption Index 10 CL1COMB - WTI Comdty Index 11 SPX - Standard & Poors Index 12 NAPMPMI- ISM Manufacturing Index 13 S5AUCO - S&P 500 Automobiles & Components Industry Group GICS Level 2 Index

2 Set up Library

[1] “/Users/faizankhan/Documents/Inseadanalytics/Exercises/Exerciseset2” [1] “~$Loadings.xlsx” “1.RData”
[3] “2.RData” “AutoStocks_v2_cache”
[5] “AutoStocks_v2.html” “AutoStocks_v2.md”
[7] “AutoStocks_v2.RMD” “AutoStocks.rmd”
[9] “CD.csv” “Component Data.xls”
[11] “DAPC12A.csv” “data”
[13] “DataAuto.csv” “ExerciseSet2_Interactive.Rmd” [15] “ExerciseSet2.Rmd” “ExerciseSet22.Rmd”
[17] “ExerciseSet22X.html” “ExerciseSet22X.md”
[19] “ExerciseSet22X.Rmd” “ExerciseSet22XVardan.RMD”
[21] “figure” “helpersSet2.R”
[23] “Loadings.csv” “Loadings.xlsx”
[25] “projectdata2.RData” “societegenerale.png”

4 Importing and exploring the data

Import data, lag the S&P500 and automotive index by 2 months, omit old observations with incomplete data

Exploring the data

Plot the data

Plot of Consumor Confidence vs. Auto Sales

## Warning: Removed 3 rows containing missing values (geom_point).

This data does not indicate a strong trend in the consumor confidence and auto sales gains to predict asset outperformance

This indicates a negative correlation between the variables, but also appears to be momentum-based. By examining the lagged variable:

Investigate Capacity Utilization Index (filtered) and automotive index

5 Feature Engineering of Inputs

Calculates the percent gains and moving averages of inputs

‘data.frame’: 336 obs. of 24 variables: $ IPindex.gain : num 0.00334 0.0062 -0.00671 0.01002 0.005 … $ IPindex.gain.filter : num 0.002023 0.001321 0.000545 0.003214 0.00363 … $ LEITOTL.gain : num 0.00148 0 -0.00148 -0.00148 0 … $ LEITOTL.gain.filter : num 0.000373 -0.000369 -0.000739 -0.00037 -0.000741 … $ CPTICHNG.gain : num 0.000959 0.00384 -0.009005 0.007739 0.002788 … $ CPTICHNG.gain.filter: num -0.000307 -0.001029 -0.001803 0.000883 0.00134 … $ SAARTOTL.gain : num -0.01504 0.01527 0.20301 -0.11875 0.00709 … $ SAARTOTL.gain.filter: num -0.0189 -0.0464 0.0167 0.0211 0.0267 … $ CQOQ.gain : num 0 0 4 0 0 … $ CQOQ.gain.filter : num -0.175 -0.175 0.825 1 1 … $ MTIB.gain : num 0.00486 -0.00181 0.00212 0.00082 0.00219 … $ MTIB.gain.filter : num 0.00243 0.00141 0.00225 0.0015 0.00083 … $ CONSSENT.gain : num -0.0319 -0.0044 0.0276 -0.0376 0.0201 … $ CONSSENT.gain.filter: num -0.00217 0.00325 -0.00714 -0.01159 0.00143 … $ NAPMPMI.gain : num 0 0.01282 -0.00422 0.04025 0.01629 … $ NAPMPMI.gain.filter : num 0.00498 0.01254 0.0065 0.01221 0.01629 … $ PCEMBIX.gain : num 0.00261 0.00292 0.00811 0.0047 0.00453 … $ PCEMBIX.gain.filter : num 0.00251 0.00312 0.00445 0.00459 0.00506 … $ CL1COMB.gain : num -0.00251 0.09703 0.03941 -0.05026 -0.0585 … $ CL1COMB.gain.filter : num 0.02137 0.03853 0.03113 0.02092 0.00692 … $ SPX.gain : num -0.06882 0.00854 0.02426 -0.02689 0.09199 … $ SPX.gain.filter : num -0.01401 -0.00558 -0.00365 -0.01573 0.02447 … $ S5AUCO.gain : num -0.0379 0.0564 0.0231 -0.0504 0.0705 … $ S5AUCO.gain.filter : num -0.03765 -0.00652 0.00598 -0.00222 0.0249 … - attr(, “na.action”)=Class ‘omit’ Named int [1:4] 1 2 3 4 .. ..- attr(, “names”)= chr [1:4] “1” “2” “3” “4”

6 Basic data analysis

Let’s see how these are correlated. The correlation matrix is as follows:

	IPindex.gain	IPindex.gain.filter	LEITOTL.gain	LEITOTL.gain.filter	CPTICHNG.gain	CPTICHNG.gain.filter	SAARTOTL.gain	SAARTOTL.gain.filter	CQOQ.gain	CQOQ.gain.filter	MTIB.gain	MTIB.gain.filter	CONSSENT.gain	CONSSENT.gain.filter	PMPMI.gain	PMPMI.gain.filter	PCEMBIX.gain	PCEMBIX.gain.filter	CL1COMB.gain	CL1COMB.gain.filter	SPX.gain	SPX.gain.filter	S5AUCO.gain	S5AUCO.gain.filter
IPindex.gain	1.00	0.68	0.54	0.57	0.96	0.63	0.10	0.29	0.09	0.13	0.29	0.22	-0.09	0.13	0.14	0.32	0.04	0.11	0.08	0.18	0.17	0.23	0.05	0.10
IPindex.gain.filter	0.68	1.00	0.58	0.78	0.61	0.91	0.09	0.35	0.05	0.16	0.56	0.52	-0.02	0.11	0.04	0.30	0.12	0.18	0.12	0.25	0.12	0.27	-0.04	0.00
LEITOTL.gain	0.54	0.58	1.00	0.83	0.54	0.59	0.15	0.38	0.09	0.15	0.22	0.12	0.19	0.37	0.30	0.50	0.06	0.00	0.15	0.17	0.05	0.43	-0.03	0.32
LEITOTL.gain.filter	0.57	0.78	0.83	1.00	0.57	0.79	0.12	0.41	0.08	0.19	0.43	0.34	0.07	0.34	0.18	0.50	0.08	0.07	0.13	0.24	0.08	0.30	-0.05	0.13
CPTICHNG.gain	0.96	0.61	0.54	0.57	1.00	0.68	0.10	0.30	0.08	0.12	0.28	0.20	-0.09	0.12	0.16	0.36	0.06	0.14	0.09	0.21	0.15	0.18	0.05	0.09
CPTICHNG.gain.filter	0.63	0.91	0.59	0.79	0.68	1.00	0.09	0.35	0.05	0.15	0.57	0.51	-0.03	0.09	0.06	0.35	0.15	0.22	0.14	0.30	0.08	0.20	-0.05	-0.01
SAARTOTL.gain	0.10	0.09	0.15	0.12	0.10	0.09	1.00	0.42	0.04	0.05	-0.14	-0.02	-0.02	0.00	0.01	0.07	0.01	-0.04	0.11	0.10	-0.03	0.11	-0.09	0.14
SAARTOTL.gain.filter	0.29	0.35	0.38	0.41	0.30	0.35	0.42	1.00	0.03	0.08	-0.02	-0.02	0.04	0.10	0.14	0.25	0.12	0.06	0.08	0.21	0.03	0.24	-0.05	0.22
CQOQ.gain	0.09	0.05	0.09	0.08	0.08	0.05	0.04	0.03	1.00	0.49	0.04	0.00	0.10	0.10	0.08	0.04	-0.11	-0.16	0.07	-0.17	0.03	0.03	0.04	0.06
CQOQ.gain.filter	0.13	0.16	0.15	0.19	0.12	0.15	0.05	0.08	0.49	1.00	0.04	0.03	0.07	0.16	0.05	0.10	-0.13	-0.25	0.11	-0.09	-0.01	-0.02	0.04	0.06
MTIB.gain	0.29	0.56	0.22	0.43	0.28	0.57	-0.14	-0.02	0.04	0.04	1.00	0.80	-0.08	-0.07	-0.15	-0.02	0.23	0.31	0.14	0.29	0.01	0.08	-0.09	-0.16
MTIB.gain.filter	0.22	0.52	0.12	0.34	0.20	0.51	-0.02	-0.02	0.00	0.03	0.80	1.00	-0.09	-0.13	-0.30	-0.27	0.15	0.32	0.03	0.23	-0.02	0.04	-0.17	-0.26
CONSSENT.gain	-0.09	-0.02	0.19	0.07	-0.09	-0.03	-0.02	0.04	0.10	0.07	-0.08	-0.09	1.00	0.44	0.18	0.13	-0.19	-0.11	-0.05	-0.13	0.01	0.21	0.02	0.25
CONSSENT.gain.filter	0.13	0.11	0.37	0.34	0.12	0.09	0.00	0.10	0.10	0.16	-0.07	-0.13	0.44	1.00	0.23	0.38	-0.17	-0.27	0.06	-0.11	0.00	0.17	-0.02	0.24
PMPMI.gain	0.14	0.04	0.30	0.18	0.16	0.06	0.01	0.14	0.08	0.05	-0.15	-0.30	0.18	0.23	1.00	0.58	0.15	0.00	0.18	0.13	0.14	0.27	0.09	0.32
PMPMI.gain.filter	0.32	0.30	0.50	0.50	0.36	0.35	0.07	0.25	0.04	0.10	-0.02	-0.27	0.13	0.38	0.58	1.00	0.19	0.20	0.21	0.36	0.05	0.26	0.00	0.26
PCEMBIX.gain	0.04	0.12	0.06	0.08	0.06	0.15	0.01	0.12	-0.11	-0.13	0.23	0.15	-0.19	-0.17	0.15	0.19	1.00	0.59	0.27	0.50	0.06	0.05	0.05	0.03
PCEMBIX.gain.filter	0.11	0.18	0.00	0.07	0.14	0.22	-0.04	0.06	-0.16	-0.25	0.31	0.32	-0.11	-0.27	0.00	0.20	0.59	1.00	0.09	0.56	-0.06	-0.02	-0.18	-0.12
CL1COMB.gain	0.08	0.12	0.15	0.13	0.09	0.14	0.11	0.08	0.07	0.11	0.14	0.03	-0.05	0.06	0.18	0.21	0.27	0.09	1.00	0.52	0.09	0.06	0.08	0.13
CL1COMB.gain.filter	0.18	0.25	0.17	0.24	0.21	0.30	0.10	0.21	-0.17	-0.09	0.29	0.23	-0.13	-0.11	0.13	0.36	0.50	0.56	0.52	1.00	0.03	0.10	-0.07	0.02
SPX.gain	0.17	0.12	0.05	0.08	0.15	0.08	-0.03	0.03	0.03	-0.01	0.01	-0.02	0.01	0.00	0.14	0.05	0.06	-0.06	0.09	0.03	1.00	0.52	0.66	0.36
SPX.gain.filter	0.23	0.27	0.43	0.30	0.18	0.20	0.11	0.24	0.03	-0.02	0.08	0.04	0.21	0.17	0.27	0.26	0.05	-0.02	0.06	0.10	0.52	1.00	0.36	0.72
S5AUCO.gain	0.05	-0.04	-0.03	-0.05	0.05	-0.05	-0.09	-0.05	0.04	0.04	-0.09	-0.17	0.02	-0.02	0.09	0.00	0.05	-0.18	0.08	-0.07	0.66	0.36	1.00	0.52
S5AUCO.gain.filter	0.10	0.00	0.32	0.13	0.09	-0.01	0.14	0.22	0.06	0.06	-0.16	-0.26	0.25	0.24	0.32	0.26	0.03	-0.12	0.13	0.02	0.36	0.72	0.52	1.00

As expected, many economic indicators are correlated.

7 PCA Analysis

First, split the data into training and validation datasets.

Conduct PCA Analysis to amalagate variables into components

Variance_Explained_Table_results<-PCA(TrainingData, graph=FALSE)
Variance_Explained_Table<-Variance_Explained_Table_results$eig


colnames(Variance_Explained_Table)<-c("Eigenvalue", 
  "Pct of explained variance", "Cumulative pct of explained variance")

show_data = round(Variance_Explained_Table, 2)
# show_data_V = round(Variance_Explained_Table_V, 2)
iprint.df(show_data)

	Eigenvalue	Pct of explained variance	Cumulative pct of explained variance
comp 1	5.47	22.77	22.77
comp 2	3.54	14.75	37.53
comp 3	2.57	10.69	48.22
comp 4	1.94	8.07	56.29
comp 5	1.56	6.49	62.78
comp 6	1.38	5.76	68.54
comp 7	1.09	4.53	73.07
comp 8	1.00	4.18	77.25
comp 9	0.75	3.13	80.38
comp 10	0.72	2.98	83.37
comp 11	0.67	2.79	86.16
comp 12	0.54	2.23	88.39
comp 13	0.46	1.93	90.32
comp 14	0.43	1.78	92.10
comp 15	0.39	1.64	93.74
comp 16	0.35	1.45	95.19
comp 17	0.27	1.13	96.32
comp 18	0.23	0.97	97.29
comp 19	0.22	0.92	98.22
comp 20	0.16	0.65	98.87
comp 21	0.12	0.51	99.38
comp 22	0.08	0.33	99.70
comp 23	0.07	0.29	100.00
comp 24	0.00	0.00	100.00

# iprint.df(show_data_V)

# Plotting the eigenvalues from PCA
eigenvalues  <- Variance_Explained_Table[, "Eigenvalue"]

df           <- cbind(as.data.frame(eigenvalues), c(1:length(eigenvalues)), rep(1, length(eigenvalues)))
colnames(df) <- c("eigenvalues", "components", "abline")
iplot.df(melt(df, id="components"))

# Breaking down different components by macroeconomic variables, to
# understand the loadings(weights) of those parameters for each component,
# which should be used to feature engineer and reduce the number of
# variables that would capture 64% of the variation
corused = cor(ProjectData[, apply(ProjectData != 0, 2, sum) > 10, drop = F])
Rotated_Results <- principal(corused, nfactors = 12, rotate = "varimax", score = TRUE)
Rotated_Factors <- round(Rotated_Results$loadings, 2)
Rotated_Factors <- as.data.frame(unclass(Rotated_Factors))
colnames(Rotated_Factors) <- paste("comp", 1:ncol(Rotated_Factors), sep = " ")

sorted_rows <- sort(Rotated_Factors[, 1], decreasing = TRUE, index.return = TRUE)$ix
Rotated_Factors <- Rotated_Factors[sorted_rows, ]
Rotated_Factors[abs(Rotated_Factors) < 0.3] <- 0

show_data <- Rotated_Factors
iprint.df(show_data, scale = TRUE)

	comp 1	comp 2	comp 3	comp 4	comp 5	comp 6	comp 7	comp 8	comp 9	comp 10	comp 11	comp 12
CPTICHNG.gain	0.96	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0
IPindex.gain	0.95	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0
CPTICHNG.gain.filter	0.55	0.64	0.00	0.00	0.00	0.00	0.00	0.36	0.00	0.00	0.00	0.0
IPindex.gain.filter	0.54	0.65	0.00	0.00	0.00	0.00	0.00	0.36	0.00	0.00	0.00	0.0
LEITOTL.gain	0.45	0.30	0.00	0.41	0.00	0.30	0.00	0.30	0.00	0.39	0.00	0.0
LEITOTL.gain.filter	0.45	0.53	0.00	0.00	0.00	0.00	0.00	0.42	0.00	0.38	0.00	0.0
PMPMI.gain.filter	0.30	0.00	0.00	0.00	0.00	0.68	0.00	0.00	0.00	0.33	0.00	0.0
SAARTOTL.gain.filter	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.80	0.00	0.00	0.34	0.0
MTIB.gain	0.00	0.87	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0
CL1COMB.gain.filter	0.00	0.00	0.53	0.00	0.00	0.00	0.00	0.00	0.64	0.00	0.00	0.0
PCEMBIX.gain.filter	0.00	0.00	0.84	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0
SPX.gain.filter	0.00	0.00	0.00	0.82	0.32	0.00	0.00	0.00	0.00	0.00	0.00	0.0
CONSSENT.gain.filter	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.87	0.00	0.0
SPX.gain	0.00	0.00	0.00	0.00	0.90	0.00	0.00	0.00	0.00	0.00	0.00	0.0
MTIB.gain.filter	0.00	0.88	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0
SAARTOTL.gain	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.93	0.0
CQOQ.gain	0.00	0.00	0.00	0.00	0.00	0.00	0.88	0.00	0.00	0.00	0.00	0.0
PMPMI.gain	0.00	0.00	0.00	0.00	0.00	0.90	0.00	0.00	0.00	0.00	0.00	0.0
CQOQ.gain.filter	0.00	0.00	0.00	0.00	0.00	0.00	0.82	0.00	0.00	0.00	0.00	0.0
S5AUCO.gain.filter	0.00	0.00	0.00	0.86	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0
CL1COMB.gain	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.93	0.00	0.00	0.0
S5AUCO.gain	0.00	0.00	0.00	0.00	0.84	0.00	0.00	0.00	0.00	0.00	0.00	0.0
PCEMBIX.gain	0.00	0.00	0.85	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0
CONSSENT.gain	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.9

For this analysis, components were selected based on Eigenvalues in order to balance the percent of variation explained while reducing the risk of overfitting.

Multiply component factors by data to create data frame with component variables

8 Logistic Regression

After defining all of the key drivers of the index performance, we will set the independent variable to binary variable, in oder to apply logistic regression techniques. Thus, the prediction model’s output will be a binary OUTPERFORM / UNDERPERFORM signal.

 IPindex.gain IPindex.gain.filter LEITOTL.gain LEITOTL.gain.filter

329 -0.0021943470 -0.0006522533 0.001990050 0.001245518 330 0.0079622877 0.0015112547 0.004965243 0.002486829 331 -0.0029903390 0.0011420176 0.005928854 0.003470041 332 0.0023835944 0.0012902990 0.002946955 0.003957775 333 0.0020481410 0.0023509210 0.004897160 0.004684553 334 0.0108799206 0.0030803292 0.001949318 0.003930571 335 0.0002037187 0.0038788437 0.003891051 0.003421121 336 0.0017550505 0.0037217077 0.005813953 0.004137870 337 -0.0023182325 0.0026301143 0.002890173 0.003636124 338 -0.0040596675 -0.0011047827 0.003842459 0.004109409 CPTICHNG.gain CPTICHNG.gain.filter SAARTOTL.gain SAARTOTL.gain.filter 329 -0.0029766543 -0.0012611236 -0.013483146 -0.002389510 330 0.0070929783 0.0007885457 0.027904328 0.013318966 331 -0.0040258994 0.0002995804 -0.039335180 -0.004103854 332 0.0013588356 0.0003623151 -0.001153403 -0.006516850 333 0.0009956547 0.0013553923 -0.034642032 -0.011806572 334 0.0097951387 0.0020309324 0.014952153 -0.015044615 335 -0.0008597689 0.0028224650 -0.015910430 -0.009188428 336 0.0007181788 0.0026623008 -0.006586826 -0.010546784 337 -0.0032921177 0.0015903577 0.006027728 -0.000379344 338 -0.0049695426 -0.0021008126 -0.039544638 -0.014003542 CQOQ.gain CQOQ.gain.filter MTIB.gain MTIB.gain.filter 329 0.00000000 -0.089285714 0.007786564 0.002385847 330 0.00000000 -0.089285714 0.004149061 0.002920139 331 -0.33333333 -0.172619048 0.003258901 0.003349476 332 0.00000000 -0.083333333 0.002669996 0.004466130 333 0.00000000 -0.083333333 0.001719556 0.002949379 334 1.58333333 0.312500000 -0.001974902 0.001418388 335 0.00000000 0.395833333 0.003089532 0.001376045 336 0.00000000 0.395833333 0.004864598 0.001924696 337 0.03225806 0.403897849 0.003439551 0.002354695 338 0.00000000 0.008064516 0.006493022 0.004471676 CONSSENT.gain CONSSENT.gain.filter NAPMPMI.gain NAPMPMI.gain.filter 329 0.075688073 0.0112991006 0.030888031 0.0030048693 330 0.046908316 0.0235817351 0.016853933 0.0213960462 331 0.003054990 0.0204479324 0.023941068 0.0198662832 332 -0.022335025 0.0258290883 0.035971223 0.0269135637 333 0.006230530 0.0084647024 -0.017361111 0.0148512782 334 0.001031992 -0.0030043786 -0.022968198 0.0048957455 335 0.001030928 -0.0035103941 0.003616637 -0.0001853624 336 -0.020597322 -0.0030759683 0.021621622 -0.0037727627 337 -0.017875920 -0.0091025807 -0.003527337 -0.0003143191 338 0.036402570 -0.0002599363 0.049557522 0.0178171109 PCEMBIX.gain PCEMBIX.gain.filter CL1COMB.gain CL1COMB.gain.filter 329 8.209432e-04 0.0014636912 0.05505762 0.0450411287 330 1.822822e-03 0.0015756122 0.08656958 0.0480537159 331 4.639738e-03 0.0023235720 -0.01693969 0.0240201363 332 8.149959e-04 0.0020246247 0.02272297 0.0368526199 333 -2.804922e-03 0.0011181584 -0.06313646 0.0073041012 334 1.451774e-03 0.0010253964 -0.02509881 -0.0206129971 335 -1.177856e-03 -0.0004290022 -0.02047436 -0.0214966644 336 9.071118e-05 -0.0006100734 -0.04718543 -0.0389737643 337 6.349206e-04 0.0002498873 0.08970460 -0.0007634991 338 2.266135e-03 0.0004534776 -0.05860076 -0.0091389849 SPX.gain SPX.gain.filter S5AUCO.gain S5AUCO.gain.filter 329 0.0178843414 0.012708479 0.033704162 0.0170319347 330 0.0371982605 0.026864451 0.023995536 0.0295890334 331 -0.0003892302 0.018223531 -0.023355391 0.0128606553 332 0.0090912169 0.015946147 -0.015145476 0.0047997074 333 0.0115762100 0.014369114 -0.013597734 -0.0070257664 334 0.0048138320 0.006273007 0.017149422 -0.0087372949 335 0.0193487689 0.011207507 0.009599871 -0.0004984794 336 0.0005464923 0.009071326 0.006791850 0.0049858521 337 0.0193028948 0.011002997 0.078809524 0.0280876665 338 0.0221881748 0.015346583 0.028323402 0.0308811616

    FALSE TRUE

FALSE 96 2 TRUE 67 2

	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	-0.3	0.2	-2.1	0.0
ReducedTrainingData$Comp1	-0.2	0.7	-0.3	0.8
ReducedTrainingData$Comp2	0.5	0.6	0.8	0.4
ReducedTrainingData$Comp3	0.0	0.7	0.0	1.0
ReducedTrainingData$Comp4	-0.4	0.6	-0.6	0.6
ReducedTrainingData$Comp5	-0.1	0.4	-0.3	0.8

9 Confusion Matrix

    FALSE TRUE

FALSE 96 2 TRUE 67 2 [1] 0.5868263

Based on the confusion matrix above, our model is 58% accurate but it can be used for predicting the outperormance of S&P vs the auto index, as opposed to our business objective, predicting auto indices overperformance relative to S&P.

Risks: For investors with short selling capabilities this can be somewhat useful, as they can short the auto stocks and long the S&P. Unfortunatiley, given the limited data specifications(frequency, sample size etc).

10 Summary

In trading, an accuracy of only 51% may be sufficient as long as this level of accuracy remains constant. While our model achieved a 58% accuracy on the training data, there are many issues that have not been addressed and are out of the scope of this analysis. In light of these issues, the authors do not believe this model’s 58% accuracy is sufficient to create a profitable trading algorithm.

Nonetheless, this model could be the basis for such a trading strategy. If the below issues and comments on further analysis are appropriately addressed, additional (relevant) input data is used, and more feature engineering is introduced, this methodology may provide the platform for a profitable trading algorithm.

Issues and risks that may impact the analysis:

*Data size: Because only monthly data was available for this analysis, the model was build based on relatively few datapoints. With more granular

*Selection: The validation data includes a period of great distress to the automotive industry. While the market performed poorly during the Global Financial Crisis, 2 of the 3 major American automakers faced bankrupcy and had to be bailed out by the US governement. This unusual time in American automotive history may have impacted the results of our analysis.

*Black Swan Events: If this data were to be exploited according to the trading strategy outlined in the summary, this model is likely to do poorly in the event of extreme and unpredictable ‘black swan events’.

*Short Selling: The trading strategy explained in the summary relies on the ability to take a short position, which may be prohibitively expensive for the retail trader.

*Seasonality: Investigating the effect of cyclical asset price movements is out of the scope of this project but may provide additional opportunities for improved feature engineering and a more accurate model.

*Data Timing: As discussed in the introduction, the release timing of the input data varies. To compensate, a 2 month lag was applied to the dataset. Given daily price data and information about the exact timing of release data, the model could be improved.

Further Analysis:

In addition to addressing the issues described above, this model could be further developed into a trading algorithm. In its current state, this analysis is not a trading algorithm in its current state. Further analysis could be done to incorporate transaction costs, create a model that determines how much money should be optimally placed on each trade, and develop a predicted profit model.