Analysing JPMorgan Stock Data¶

Summary: Here, I imported stock data for JPMorgan and for five other big banks. First, in the Exploring and Visualizing the Data section, I explore the data using some interactive graphs and tables. Then, in the Linear Regression section, I use a linear model to predict the JPM daily stock returns. The linear model was roughly able to predict the returns, except for some cases where there were outliers.¶

Sections:¶

- Importing Libraries and Data¶

- Exploring and Visualizing the Data¶

- Using Linear Regression to Predict JPM Stock Returns¶

Importing Libraries and Data¶

import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data, wb
import numpy as np

import matplotlib.pyplot as plt
import datetime
%matplotlib inline

import plotly
import cufflinks as cf
cf.go_offline()

start = datetime.date(year=2006, month=1, day=1)
end = datetime.date(year=2016, month=1, day=1)

BAC = data.DataReader(name="BAC", data_source='robinhood', start=start, end=end)
C = data.DataReader(name="C", data_source='robinhood', start=start, end=end)
GS = data.DataReader(name="GS", data_source='robinhood', start=start, end=end)
JPM = data.DataReader(name="JPM", data_source='robinhood', start=start, end=end)
MS = data.DataReader(name="MS", data_source='robinhood', start=start, end=end)
WFC = data.DataReader(name="WFC", data_source='robinhood', start=start, end=end)

BAC = BAC.xs(key='BAC', axis=0, drop_level=True)
C = C.xs(key='C', axis=0, drop_level=True)
GS = GS.xs(key='GS', axis=0, drop_level=True)
JPM = JPM.xs(key='JPM', axis=0, drop_level=True)
MS = MS.xs(key='MS', axis=0, drop_level=True)
WFC = WFC.xs(key='WFC', axis=0, drop_level=True)

tickers = ['BAC','C','GS','JPM','MS','WFC']

bank_stocks = pd.read_pickle('all_banks')

Exploring and Visualizing the Data¶

Here is the 'head' of the stock data:¶

bank_stocks.head()

Creating new table showing the daily returns of each bank:¶

returns = pd.DataFrame()

for tick in tickers:
    returns[tick + ' Returns'] = bank_stocks[tick]['Close'].pct_change()

returns.head()

Below are some summary statistics for each bank's daily returns:¶

returns.describe()

Time series displaying the daily closing price of each stock:¶

bank_stocks.xs(key='Close',axis=1,level='Stock Info').iplot()

Heat Map showing the correlation between each stock (blue is more correlated, red is less correlated):¶

close_corr = bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr()
close_corr.iplot(kind='heatmap',colorscale='rdylbu')

Candle Plot of JP Morgan's stock price:¶

bank_stocks.loc['2015-01-01':'2016-01-01']['JPM'][['Open','High', 'Low', 'Close']].iplot(kind='candle')

Moving Average of JPM stock, plotted along with the daily closing price:¶

bank_stocks.loc['2015-01-01':'2016-01-01']['JPM']['Close'].ta_plot(study='sma')

Bollinger Band Plot of JPM stock price:¶

bank_stocks.loc['2015-01-01':'2016-01-01']['JPM']['Close'].ta_plot(study='boll', color='red')

Histogram showing distribution of JPM daily returns:¶

import plotly.plotly as py
import plotly.figure_factory as ff

returns[returns.index.year == 2015]['JPM Returns'].dropna().iplot(kind='histogram')

Boxplot comparing distributions of daily returns for each bank:¶

returns[returns.index.year == 2015].iplot(kind='box')

Using Linear Regression to Predict JPM Stock Returns¶

Splitting the data into a training set and a test set, then training the model on the training set, and finally creating predictions:¶

y = returns['JPM Returns']
X = bank_stocks['JPM'][['Open','Close','High','Low','Volume']]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

predictions = lm.predict(X_test)

Scatterplot of JPM predicted returns versus actual returns:¶

import plotly.graph_objs as go
test_vs_predicted = pd.DataFrame({'Y Test':y_test, 'Predicted Y':predictions})
test_vs_predicted.iplot(kind='scatter', x='Y Test', y='Predicted Y', mode='markers', size=8, title='JPM Predicted Returns vs Actual Returns', xTitle='JPM Actual Returns', yTitle='JPM Predicted Returns')

Different measurements of error in the model used to predict returns:¶

from sklearn import metrics

test_vs_predicted = test_vs_predicted.dropna()

print('MAE:', metrics.mean_absolute_error(test_vs_predicted['Y Test'], test_vs_predicted['Predicted Y']))
print('MSE:', metrics.mean_squared_error(test_vs_predicted['Y Test'], test_vs_predicted['Predicted Y']))
print('RMSE:', np.sqrt(metrics.mean_squared_error(test_vs_predicted['Y Test'], test_vs_predicted['Predicted Y'])))

MAE: 0.009179410944068678
MSE: 0.00024345376365215928
RMSE: 0.015603004955846142

- The three error measurements show that there was a fairly low amount of error.¶

Coeffiecients showing the relationship between JPM returns and the columns from the imported stock data:¶

coeffecients = pd.DataFrame(lm.coef_,X.columns)
coeffecients.columns = ['Coeffecient']
coeffecients

- According to the coefficients from the linear model, JP Morgan's returns are negatively correlated with opening price, daily high price, and daily low price, and they are positively correlated with closing price and volume.¶

Bank Ticker	BAC					C					...	MS					WFC
Stock Info	Open	High	Low	Close	Volume	Open	High	Low	Close	Volume	...	Open	High	Low	Close	Volume	Open	High	Low	Close	Volume
Date
2006-01-03	46.92	47.18	46.15	47.08	16296700	490.0	493.8	481.1	492.9	1537660	...	57.17	58.49	56.74	58.31	5377000	31.60	31.98	31.20	31.90	11016400
2006-01-04	47.00	47.24	46.45	46.58	17757900	488.6	491.0	483.5	483.8	1871020	...	58.70	59.28	58.35	58.35	7977800	31.80	31.82	31.36	31.53	10871000
2006-01-05	46.58	46.83	46.32	46.64	14970900	484.4	487.8	484.0	486.2	1143160	...	58.55	58.59	58.02	58.51	5778000	31.50	31.56	31.31	31.50	10158000
2006-01-06	46.80	46.91	46.35	46.57	12599800	488.8	489.0	482.0	486.2	1370250	...	58.77	58.85	58.05	58.57	6889800	31.58	31.78	31.38	31.68	8403800
2006-01-09	46.72	46.97	46.36	46.60	15620000	486.0	487.4	483.0	483.9	1680740	...	58.63	59.29	58.62	59.19	4144500	31.68	31.82	31.56	31.68	5619600

	BAC Returns	C Returns	GS Returns	JPM Returns	MS Returns	WFC Returns
Date
2006-01-03	NaN	NaN	NaN	NaN	NaN	NaN
2006-01-04	-0.010620	-0.018462	-0.013812	-0.014183	0.000686	-0.011599
2006-01-05	0.001288	0.004961	-0.000393	0.003029	0.002742	-0.000951
2006-01-06	-0.001501	0.000000	0.014169	0.007046	0.001025	0.005714
2006-01-09	0.000644	-0.004731	0.012030	0.016242	0.010586	0.000000

	BAC Returns	C Returns	GS Returns	JPM Returns	MS Returns	WFC Returns
count	2516.000000	2516.000000	2516.000000	2516.000000	2516.000000	2516.000000
mean	0.000258	0.002977	0.000451	0.000575	0.000423	0.000656
std	0.036650	0.179969	0.025346	0.027656	0.037820	0.030233
min	-0.289694	-0.899107	-0.189596	-0.207274	-0.258929	-0.238223
25%	-0.011303	-0.011193	-0.010185	-0.009934	-0.012397	-0.008664
50%	0.000000	-0.000211	0.000368	0.000168	0.000254	0.000000
75%	0.010751	0.010751	0.011004	0.010022	0.012504	0.008844
max	0.352691	8.769912	0.264678	0.250967	0.869835	0.327645

	Coeffecient
Stock Info
Open	-2.010152e-02
Close	3.285097e-02
High	-7.270615e-03
Low	-5.398712e-03
Volume	4.627037e-11