Brendan's Sample Work

Hudson River Trading Practice Notebook-Copy1

Pairs Trading Algorithm - Determining Cointegration and Applying Kalman Filter


Summary: In this notebook, I first check a group of large tech company stocks to see if they are cointegrated so that they can be traded in a Pairs Trading Algorithm from Quantopian. After checking the results of the algorithm, a Kalman filter is analysed to see if it can improve the algorithm. Specifically, I check to see if the Kalman Filter can be a more effective rolling average when calculating the z-score for the pairs trading.


Sections:

- Importing Libraries

- Selecting Stocks that are Cointegrated

- Results of Pairs Trading Algorithm: 2014-01-06 to 2016-09-28

- Analysing Kalman Filter Rolling Average

- Results After I Applied Kalman Filter: 2014-01-06 to 2016-09-28

- Appendix: Another way to test for cointegration - Stationarity


Importing Libraries


In [1]:
from pykalman import KalmanFilter
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import poly1d
import seaborn as sns

import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint, adfuller

# import plotly
# import cufflinks as cf
# cf.go_offline()

Selecting Stocks that are Cointegrated


Heatmap displaying stock pairs that are likely cointegrated:

In [2]:
symbol_list = ['IBM', 'INTC', 'GE', 'MSFT', 'CSCO', 'AAPL', 'GOOG', 'AMZN']
securities_panel = get_pricing(symbol_list, fields=['price']
                               , start_date='2013-01-01', end_date='2014-01-01')
securities_panel.minor_axis = map(lambda x: x.symbol, securities_panel.minor_axis)


scores, pvalues, pairs = find_cointegrated_pairs(securities_panel)
sns.heatmap(pvalues, xticklabels=symbol_list, yticklabels=symbol_list, cmap='RdYlGn_r' 
                , mask = (pvalues >= 0.05)
                )
print pairs
[(u'INTC', u'MSFT'), (u'GE', u'AMZN'), (u'AAPL', u'AMZN')]

Time series displaying the cointegrated stocks:

In [3]:
symbol_list = ['MSFT', 'INTC']
prices = get_pricing(symbol_list, fields=['price']
                               , start_date='2013-01-01', end_date='2014-01-01')['price']
prices.columns = map(lambda x: x.symbol, prices.columns)
X1 = prices[symbol_list[0]]
X2 = prices[symbol_list[1]]
In [4]:
plt.plot(X1.index, X1.values)
plt.plot(X1.index, X2.values)
plt.xlabel('Time')
plt.ylabel('Series Value')
plt.legend([X1.name, X2.name]);

Results of Pairs Trading Algorithm: 2014-01-06 to 2016-09-28


In [5]:
bt = get_backtest('5bb1966437e5f1426a9febce')
bt.create_returns_tear_sheet()
100% Time: 0:00:01|###########################################################|
Start date2014-01-06
End date2016-09-28
Total months32
Backtest
Annual return 5.2%
Cumulative returns 14.9%
Annual volatility 10.9%
Sharpe ratio 0.52
Calmar ratio 0.52
Stability 0.64
Max drawdown -9.9%
Omega ratio 1.10
Sortino ratio 0.77
Skew 0.15
Kurtosis 12.94
Tail ratio 1.15
Daily value at risk -1.4%
Gross leverage 0.95
Daily turnover 29.3%
Alpha 0.06
Beta -0.01
Worst drawdown periods Net drawdown in % Peak date Valley date Recovery date Duration
0 9.94 2015-07-06 2015-09-14 2016-02-10 158
1 8.72 2014-06-09 2014-11-13 2015-06-26 275
2 5.58 2016-05-12 2016-09-16 NaT NaN
3 5.46 2014-01-28 2014-03-24 2014-05-29 88
4 2.89 2016-03-30 2016-04-19 2016-04-22 18

Analysing Kalman Filter Rolling Average


Creating Kalman rolling average and moving averages:

In [6]:
# Get pricing data for 2014 to 2018
start = '2014-01-01'
end = '2018-01-01'
pricing = get_pricing(['MSFT','INTC'], fields='price', start_date=start, end_date=end)
x = pricing.iloc[:, 0] - pricing.iloc[:, 1]

# Create Kalman filter by sprecifying input matrices
kf = KalmanFilter(transition_matrices = [1],
                  observation_matrices = [1],
                  initial_state_mean = 0,
                  initial_state_covariance = 1,
                  observation_covariance=1,
                  transition_covariance=.01)

# Use Kalman Filter on historical prices to get estimate for rolling average
state_means, _ = kf.filter(x.values)
state_means = pd.Series(state_means.flatten(), index=x.index)

# Moving Averages
mean30 = x.rolling(window = 30).mean()
mean60 = x.rolling(window = 60).mean()
mean90 = x.rolling(window = 90).mean()

Times series comparing Kalman Estimate to moving averages:

In [7]:
plt.plot(state_means)
plt.plot(x)
plt.plot(mean30)
plt.plot(mean60)
plt.plot(mean90)
plt.title('Kalman filter estimate of average')
plt.legend(['Kalman Estimate', 'X', '30-day Moving Average', '60-day Moving Average','90-day Moving Average'])
plt.xlabel('Day')
plt.ylabel('Price');

- The Kalman Estimate appears to follow the center of the daily data most closely when compare to the moving averages


Results After I Applied Kalman Filter: 2014-01-06 to 2016-09-28


In [8]:
bt = get_backtest('5bb195d2096b314238402073')
bt.create_returns_tear_sheet()
100% Time: 0:00:00|###########################################################|
Start date2014-01-06
End date2016-09-28
Total months32
Backtest
Annual return 6.6%
Cumulative returns 19.0%
Annual volatility 10.8%
Sharpe ratio 0.64
Calmar ratio 0.45
Stability 0.48
Max drawdown -14.5%
Omega ratio 1.13
Sortino ratio 0.97
Skew 0.44
Kurtosis 11.66
Tail ratio 1.19
Daily value at risk -1.3%
Gross leverage 0.92
Daily turnover 1.9%
Alpha 0.07
Beta -0.04
Worst drawdown periods Net drawdown in % Peak date Valley date Recovery date Duration
0 14.48 2015-05-29 2016-02-19 NaT NaN
1 7.40 2014-07-30 2014-11-13 2015-01-15 122
2 4.29 2015-03-24 2015-03-27 2015-04-24 24
3 2.54 2014-04-14 2014-04-28 2014-06-03 37
4 2.26 2015-01-15 2015-01-26 2015-01-27 9

Appendix: Another way to test for cointegration - Stationarity:

In [10]:
X1 = sm.add_constant(X1)
results = sm.OLS(X2, X1).fit()

# Get rid of the constant column
X1 = X1[symbol_list[0]]

# results.params

b = results.params[symbol_list[0]]
Z = X2 - b * X1
Z.name = 'Spread'
In [11]:
plt.plot(Z.index, Z.values)
plt.xlabel('Time')
plt.ylabel('Series Value')
plt.legend([Z.name]);

check_for_stationarity(Z);
p-value = 0.00118436731876 The series Spread is likely stationary.