Pairs Trading Algorithm - Determining Cointegration and Applying Kalman Filter¶

Summary: In this notebook, I first check a group of large tech company stocks to see if they are cointegrated so that they can be traded in a Pairs Trading Algorithm from Quantopian. After checking the results of the algorithm, a Kalman filter is analysed to see if it can improve the algorithm. Specifically, I check to see if the Kalman Filter can be a more effective rolling average when calculating the z-score for the pairs trading.¶

Sections:¶

- Importing Libraries¶

- Selecting Stocks that are Cointegrated¶

- Results of Pairs Trading Algorithm: 2014-01-06 to 2016-09-28¶

- Analysing Kalman Filter Rolling Average¶

- Results After I Applied Kalman Filter: 2014-01-06 to 2016-09-28¶

- Appendix: Another way to test for cointegration - Stationarity¶

Importing Libraries¶

from pykalman import KalmanFilter
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import poly1d
import seaborn as sns

import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint, adfuller

# import plotly
# import cufflinks as cf
# cf.go_offline()

Selecting Stocks that are Cointegrated¶

Heatmap displaying stock pairs that are likely cointegrated:¶

symbol_list = ['IBM', 'INTC', 'GE', 'MSFT', 'CSCO', 'AAPL', 'GOOG', 'AMZN']
securities_panel = get_pricing(symbol_list, fields=['price']
                               , start_date='2013-01-01', end_date='2014-01-01')
securities_panel.minor_axis = map(lambda x: x.symbol, securities_panel.minor_axis)


scores, pvalues, pairs = find_cointegrated_pairs(securities_panel)
sns.heatmap(pvalues, xticklabels=symbol_list, yticklabels=symbol_list, cmap='RdYlGn_r' 
                , mask = (pvalues >= 0.05)
                )
print pairs

[(u'INTC', u'MSFT'), (u'GE', u'AMZN'), (u'AAPL', u'AMZN')]

Time series displaying the cointegrated stocks:¶

symbol_list = ['MSFT', 'INTC']
prices = get_pricing(symbol_list, fields=['price']
                               , start_date='2013-01-01', end_date='2014-01-01')['price']
prices.columns = map(lambda x: x.symbol, prices.columns)
X1 = prices[symbol_list[0]]
X2 = prices[symbol_list[1]]

plt.plot(X1.index, X1.values)
plt.plot(X1.index, X2.values)
plt.xlabel('Time')
plt.ylabel('Series Value')
plt.legend([X1.name, X2.name]);

Results of Pairs Trading Algorithm: 2014-01-06 to 2016-09-28¶

bt = get_backtest('5bb1966437e5f1426a9febce')
bt.create_returns_tear_sheet()

100% Time: 0:00:01|###########################################################|

Analysing Kalman Filter Rolling Average¶

Creating Kalman rolling average and moving averages:¶

# Get pricing data for 2014 to 2018
start = '2014-01-01'
end = '2018-01-01'
pricing = get_pricing(['MSFT','INTC'], fields='price', start_date=start, end_date=end)
x = pricing.iloc[:, 0] - pricing.iloc[:, 1]

# Create Kalman filter by sprecifying input matrices
kf = KalmanFilter(transition_matrices = [1],
                  observation_matrices = [1],
                  initial_state_mean = 0,
                  initial_state_covariance = 1,
                  observation_covariance=1,
                  transition_covariance=.01)

# Use Kalman Filter on historical prices to get estimate for rolling average
state_means, _ = kf.filter(x.values)
state_means = pd.Series(state_means.flatten(), index=x.index)

# Moving Averages
mean30 = x.rolling(window = 30).mean()
mean60 = x.rolling(window = 60).mean()
mean90 = x.rolling(window = 90).mean()

Times series comparing Kalman Estimate to moving averages:¶

plt.plot(state_means)
plt.plot(x)
plt.plot(mean30)
plt.plot(mean60)
plt.plot(mean90)
plt.title('Kalman filter estimate of average')
plt.legend(['Kalman Estimate', 'X', '30-day Moving Average', '60-day Moving Average','90-day Moving Average'])
plt.xlabel('Day')
plt.ylabel('Price');

- The Kalman Estimate appears to follow the center of the daily data most closely when compare to the moving averages¶

Results After I Applied Kalman Filter: 2014-01-06 to 2016-09-28¶

bt = get_backtest('5bb195d2096b314238402073')
bt.create_returns_tear_sheet()

100% Time: 0:00:00|###########################################################|

Appendix: Another way to test for cointegration - Stationarity:¶

X1 = sm.add_constant(X1)
results = sm.OLS(X2, X1).fit()

# Get rid of the constant column
X1 = X1[symbol_list[0]]

# results.params

b = results.params[symbol_list[0]]
Z = X2 - b * X1
Z.name = 'Spread'

plt.plot(Z.index, Z.values)
plt.xlabel('Time')
plt.ylabel('Series Value')
plt.legend([Z.name]);

check_for_stationarity(Z);

p-value = 0.00118436731876 The series Spread is likely stationary.

	Backtest
Start date	2014-01-06
End date	2016-09-28
Total months	32
Annual return	5.2%
Cumulative returns	14.9%
Annual volatility	10.9%
Sharpe ratio	0.52
Calmar ratio	0.52
Stability	0.64
Max drawdown	-9.9%
Omega ratio	1.10
Sortino ratio	0.77
Skew	0.15
Kurtosis	12.94
Tail ratio	1.15
Daily value at risk	-1.4%
Gross leverage	0.95
Daily turnover	29.3%
Alpha	0.06
Beta	-0.01

Worst drawdown periods	Net drawdown in %	Peak date	Valley date	Recovery date	Duration
0	9.94	2015-07-06	2015-09-14	2016-02-10	158
1	8.72	2014-06-09	2014-11-13	2015-06-26	275
2	5.58	2016-05-12	2016-09-16	NaT	NaN
3	5.46	2014-01-28	2014-03-24	2014-05-29	88
4	2.89	2016-03-30	2016-04-19	2016-04-22	18

	Backtest
Start date	2014-01-06
End date	2016-09-28
Total months	32
Annual return	6.6%
Cumulative returns	19.0%
Annual volatility	10.8%
Sharpe ratio	0.64
Calmar ratio	0.45
Stability	0.48
Max drawdown	-14.5%
Omega ratio	1.13
Sortino ratio	0.97
Skew	0.44
Kurtosis	11.66
Tail ratio	1.19
Daily value at risk	-1.3%
Gross leverage	0.92
Daily turnover	1.9%
Alpha	0.07
Beta	-0.04

Worst drawdown periods	Net drawdown in %	Peak date	Valley date	Recovery date	Duration
0	14.48	2015-05-29	2016-02-19	NaT	NaN
1	7.40	2014-07-30	2014-11-13	2015-01-15	122
2	4.29	2015-03-24	2015-03-27	2015-04-24	24
3	2.54	2014-04-14	2014-04-28	2014-06-03	37
4	2.26	2015-01-15	2015-01-26	2015-01-27	9