Brendan's Sample Work

finance_project

Analysing JPMorgan Stock Data


Summary: Here, I imported stock data for JPMorgan and for five other big banks. First, in the Exploring and Visualizing the Data section, I explore the data using some interactive graphs and tables. Then, in the Linear Regression section, I use a linear model to predict the JPM daily stock returns. The linear model was roughly able to predict the returns, except for some cases where there were outliers.


Sections:

- Importing Libraries and Data

- Exploring and Visualizing the Data

- Using Linear Regression to Predict JPM Stock Returns


Importing Libraries and Data


In [114]:
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data, wb
import numpy as np

import matplotlib.pyplot as plt
import datetime
%matplotlib inline

import plotly
import cufflinks as cf
cf.go_offline()
In [4]:
start = datetime.date(year=2006, month=1, day=1)
end = datetime.date(year=2016, month=1, day=1)
In [5]:
BAC = data.DataReader(name="BAC", data_source='robinhood', start=start, end=end)
C = data.DataReader(name="C", data_source='robinhood', start=start, end=end)
GS = data.DataReader(name="GS", data_source='robinhood', start=start, end=end)
JPM = data.DataReader(name="JPM", data_source='robinhood', start=start, end=end)
MS = data.DataReader(name="MS", data_source='robinhood', start=start, end=end)
WFC = data.DataReader(name="WFC", data_source='robinhood', start=start, end=end)
In [6]:
BAC = BAC.xs(key='BAC', axis=0, drop_level=True)
C = C.xs(key='C', axis=0, drop_level=True)
GS = GS.xs(key='GS', axis=0, drop_level=True)
JPM = JPM.xs(key='JPM', axis=0, drop_level=True)
MS = MS.xs(key='MS', axis=0, drop_level=True)
WFC = WFC.xs(key='WFC', axis=0, drop_level=True)
In [7]:
tickers = ['BAC','C','GS','JPM','MS','WFC']
In [ ]:
bank_stocks = pd.read_pickle('all_banks')

Exploring and Visualizing the Data


Here is the 'head' of the stock data:

In [115]:
bank_stocks.head()
Out[115]:
Bank Ticker BAC C ... MS WFC
Stock Info Open High Low Close Volume Open High Low Close Volume ... Open High Low Close Volume Open High Low Close Volume
Date
2006-01-03 46.92 47.18 46.15 47.08 16296700 490.0 493.8 481.1 492.9 1537660 ... 57.17 58.49 56.74 58.31 5377000 31.60 31.98 31.20 31.90 11016400
2006-01-04 47.00 47.24 46.45 46.58 17757900 488.6 491.0 483.5 483.8 1871020 ... 58.70 59.28 58.35 58.35 7977800 31.80 31.82 31.36 31.53 10871000
2006-01-05 46.58 46.83 46.32 46.64 14970900 484.4 487.8 484.0 486.2 1143160 ... 58.55 58.59 58.02 58.51 5778000 31.50 31.56 31.31 31.50 10158000
2006-01-06 46.80 46.91 46.35 46.57 12599800 488.8 489.0 482.0 486.2 1370250 ... 58.77 58.85 58.05 58.57 6889800 31.58 31.78 31.38 31.68 8403800
2006-01-09 46.72 46.97 46.36 46.60 15620000 486.0 487.4 483.0 483.9 1680740 ... 58.63 59.29 58.62 59.19 4144500 31.68 31.82 31.56 31.68 5619600

5 rows × 30 columns

Creating new table showing the daily returns of each bank:

In [113]:
returns = pd.DataFrame()

for tick in tickers:
    returns[tick + ' Returns'] = bank_stocks[tick]['Close'].pct_change()

returns.head()
Out[113]:
BAC Returns C Returns GS Returns JPM Returns MS Returns WFC Returns
Date
2006-01-03 NaN NaN NaN NaN NaN NaN
2006-01-04 -0.010620 -0.018462 -0.013812 -0.014183 0.000686 -0.011599
2006-01-05 0.001288 0.004961 -0.000393 0.003029 0.002742 -0.000951
2006-01-06 -0.001501 0.000000 0.014169 0.007046 0.001025 0.005714
2006-01-09 0.000644 -0.004731 0.012030 0.016242 0.010586 0.000000

Below are some summary statistics for each bank's daily returns:

In [112]:
returns.describe()
Out[112]:
BAC Returns C Returns GS Returns JPM Returns MS Returns WFC Returns
count 2516.000000 2516.000000 2516.000000 2516.000000 2516.000000 2516.000000
mean 0.000258 0.002977 0.000451 0.000575 0.000423 0.000656
std 0.036650 0.179969 0.025346 0.027656 0.037820 0.030233
min -0.289694 -0.899107 -0.189596 -0.207274 -0.258929 -0.238223
25% -0.011303 -0.011193 -0.010185 -0.009934 -0.012397 -0.008664
50% 0.000000 -0.000211 0.000368 0.000168 0.000254 0.000000
75% 0.010751 0.010751 0.011004 0.010022 0.012504 0.008844
max 0.352691 8.769912 0.264678 0.250967 0.869835 0.327645

Time series displaying the daily closing price of each stock:

In [18]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').iplot()

Heat Map showing the correlation between each stock (blue is more correlated, red is less correlated):

In [19]:
close_corr = bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr()
close_corr.iplot(kind='heatmap',colorscale='rdylbu')

Candle Plot of JP Morgan's stock price:

In [98]:
bank_stocks.loc['2015-01-01':'2016-01-01']['JPM'][['Open','High', 'Low', 'Close']].iplot(kind='candle')

Moving Average of JPM stock, plotted along with the daily closing price:

In [21]:
bank_stocks.loc['2015-01-01':'2016-01-01']['JPM']['Close'].ta_plot(study='sma')

Bollinger Band Plot of JPM stock price:

In [97]:
bank_stocks.loc['2015-01-01':'2016-01-01']['JPM']['Close'].ta_plot(study='boll', color='red')

Histogram showing distribution of JPM daily returns:

In [23]:
import plotly.plotly as py
import plotly.figure_factory as ff
In [24]:
returns[returns.index.year == 2015]['JPM Returns'].dropna().iplot(kind='histogram')

Boxplot comparing distributions of daily returns for each bank:

In [25]:
returns[returns.index.year == 2015].iplot(kind='box')

Using Linear Regression to Predict JPM Stock Returns


Splitting the data into a training set and a test set, then training the model on the training set, and finally creating predictions:

In [57]:
y = returns['JPM Returns']
X = bank_stocks['JPM'][['Open','Close','High','Low','Volume']]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
In [59]:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)
Out[59]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [60]:
predictions = lm.predict(X_test)

Scatterplot of JPM predicted returns versus actual returns:

In [101]:
import plotly.graph_objs as go
test_vs_predicted = pd.DataFrame({'Y Test':y_test, 'Predicted Y':predictions})
test_vs_predicted.iplot(kind='scatter', x='Y Test', y='Predicted Y', mode='markers', size=8, title='JPM Predicted Returns vs Actual Returns', xTitle='JPM Actual Returns', yTitle='JPM Predicted Returns')

Different measurements of error in the model used to predict returns:

In [111]:
from sklearn import metrics

test_vs_predicted = test_vs_predicted.dropna()

print('MAE:', metrics.mean_absolute_error(test_vs_predicted['Y Test'], test_vs_predicted['Predicted Y']))
print('MSE:', metrics.mean_squared_error(test_vs_predicted['Y Test'], test_vs_predicted['Predicted Y']))
print('RMSE:', np.sqrt(metrics.mean_squared_error(test_vs_predicted['Y Test'], test_vs_predicted['Predicted Y'])))
MAE: 0.009179410944068678
MSE: 0.00024345376365215928
RMSE: 0.015603004955846142

- The three error measurements show that there was a fairly low amount of error.

Coeffiecients showing the relationship between JPM returns and the columns from the imported stock data:

In [99]:
coeffecients = pd.DataFrame(lm.coef_,X.columns)
coeffecients.columns = ['Coeffecient']
coeffecients
Out[99]:
Coeffecient
Stock Info
Open -2.010152e-02
Close 3.285097e-02
High -7.270615e-03
Low -5.398712e-03
Volume 4.627037e-11

- According to the coefficients from the linear model, JP Morgan's returns are negatively correlated with opening price, daily high price, and daily low price, and they are positively correlated with closing price and volume.