zhuzilin's Blog

about

About Linear Time Series

date: 2019-03-08
tags: machine-learning  

This post mainly uses material from Analysis of Financial Time Series.

Stationarity

Stationarity is the foundation of time series analysis.

Strictly stationary

If the joint distribution of (rt1,...,rtk)(r_{t1}, ..., r_{tk}) is identical to that of (rt1t,...,rtkt)(r_{t1_t}, ..., r_{tk_t}), where kk is an arbitrary positive integer and (t1,...,tk)(t_1, ..., t_k) is a collection of k positive integers.

This is a really strong condition.

Weak stationary

{rt}\{r_t\} is weak stationary if

E[rt]=μ,Cov(rt,rtl)=γlE[r_t]=\mu, Cov(r_t, r_{t-l})=\gamma_l

In application, weak stationary enables one to make inference concerning future obeservations.

We can assume that the first two moment of a weak stationary series is finite. And if {rt}\{r_t\} is normally distributed, weak stationary is equivalent to strict stationary.

Also, γl\gamma_l is called the lag-ll autocovariance of rtr_t. And we have

γ0=Var(rt),γl=γl\gamma_0=Var(r_t), \gamma_{-l}=\gamma_l

Correlation and autocorrelation function (ACF)

Correlation efficient:

ρx,y=Cov(X,Y)Var(X)Var(Y)=E[(Xμx)(Yμy)]E[(Xμx)2E[(Yμy)2]\rho_{x,y}=\frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}}=\frac{E[(X-\mu_x)(Y-\mu_y)]}{\sqrt{E[(X-\mu_x)^2E[(Y-\mu_y)^2]}}

And empirically, we can insert mean as the expectation in the formula.

Autocorrelation Function (ACF):

Consider a weak stationary series

ρl=Cov(rt,rtl)Var(rt)Var(rtl)=Cov(rt,rtl)Var(rt)=γlγ0\rho_l=\frac{Cov(r_t, r_{t-l})}{\sqrt{Var(r_t)Var(r_{t-l})}}=\frac{Cov(r_t, r_{t-l})}{Var(r_t)}=\frac{\gamma_l}{\gamma_0}

Then, the empirical estimation of ρ1^\hat{\rho_1}.

ρ^l=t=l+1T(rtrˉ)(rtlrˉ)t=1T(rtrˉ)\hat{\rho}_l=\frac{\sum_{t=l+1}^T(r_t-\bar{r})(r_{t-l}-\bar{r})}{\sum_{t=1}^T(r_t-\bar{r})}

The rˉ\bar{r} is the mean across the whole data.

This empirical estimation is called sample autocorrelation function. It plays an important role in linear series analysis.

White Noise

A time series rtr_t is called a white noise if {rt}\{r_t\} is a sequence of independent and identically distributed random variables with finite mean and variance. In particular, if rtr_t is normally distributed with mean 0 and variance σ2\sigma^2, the series is called a Gaussian white noise.

For white noise, all ACFs are zero.

Linear Time Series

A time series rtr_t is said to be linear if

rt=μ+i=0ψiatir_t=\mu+\sum_{i=0}^\infty \psi_i a_{t-i}

Where {at}\{a_t\} is a white noise sequence. And we define ψ0=1\psi_0=1

For this model, we have

E[rt]=μ,Var(rt)=σa2i=0ψi2E[r_t]=\mu, Var(r_t)=\sigma_a^2\sum_{i=0}^\infty\psi_i^2

If variance exist, we will have ψi20,(i)\psi_i^2\rightarrow0, (i\rightarrow\infty)

And lag-ll is:

γl=Cov(rt,rtl)=E[(i=0ψiati)(j=0ψjatlj)]=E[i,j=0ψiψjatiatlj]=j=0ψj+lψjE[atlj2]=σa2j=0ψj+lψj\begin{aligned} \gamma_l &= Cov(r_t, r_{t-l}) = E[(\sum_{i=0}^\infty\psi_ia_{t-i})(\sum_{j=0}^\infty\psi_ja_{t-l-j})]\\ &=E[\sum_{i,j=0}^\infty\psi_i\psi_ja_{t-i}a_{t-l-j}]\\ &=\sum_{j=0}^\infty\psi_{j+l}\psi_jE[a_{t-l-j}^2]=\sigma_a^2\sum_{j=0}^\infty\psi_{j+l}\psi_j \end{aligned}

Therefore

ρl=yly0=i=0ψj+lψj1+i=1ψj2\rho_l=\frac{y_l}{y_0}=\frac{\sum_{i=0}^\infty\psi_{j+l}\psi_j}{1+\sum_{i=1}^\infty\psi_j^2}

For weak stationary time series, the variance exist, and therefore we will have ρl0,(l)\rho_l\rightarrow0, (l\rightarrow\infty).

AR

Consider a monthly return rtr_t, the lag-1 will be large and therefore rt1r_{t-1} might be useful in predicting rtr_t. The simple model for this is:

rt=ϕ0+ϕ1rt1+atr_t=\phi_0+\phi_1 r_{t-1}+a_t

The model is in the same form as linear regression, therefore we call is autoregressive model of order 1, or simply AR(1). There are also lots of similarity and difference between AR and linear regression, which we will talk about later. For now, we have

E(rtrt1)=ϕ0+ϕ1rt1,Var(rtrt1)=σa2E(r_t|r_{t-1})=\phi_0+\phi_1r_{t-1}, Var(r_t|r_{t-1})=\sigma_a^2

This is Markov property. And there will be AR(p) which is

rt=ϕ0+ϕ1rt1++ϕprtp+atr_t=\phi_0+\phi_1 r_{t-1}+\dots +\phi_pr_{t-p} + a_t

Properties

AR(1)

From stationary, we will have

μ=ϕ0+ϕ1μ\mu = \phi_0+\phi_1\mu

And therefore

rtμ=ϕ1(rt1μ)+at(rtμ)(rtlμ)=ϕ1(rt1μ)(rtlμ)+at(rtlμ)\begin{aligned} r_t-\mu &= \phi_1(r_{t-1}-\mu)+a_t\\ (r_t-\mu)(r_{t-l}-\mu) &= \phi_1(r_{t-1}-\mu)(r_{t-l}-\mu)+a_t(r_{t-l}-\mu)\\ \end{aligned}

Therefore

γl={ϕ1γ1+σa2, if l=0ϕ1γl1, otherwise\gamma_l = \left\{\begin{array} {l} \phi_1\gamma_{1}+\sigma_a^2,\ if\ l=0\\ \phi_1\gamma_{l-1},\ otherwise\\ \end{array}\right.

And also from stationary

γ0=ϕ12γ0+σa2\gamma_0=\phi_1^2\gamma_0+\sigma_a^2

We could have

ρl=ϕ1ρl1=ϕ1lρ0=ϕ1l\rho_l=\phi_1\rho_{l-1}=\phi_1^l\rho_0=\phi_1^l

AR(2)

Use the similar method, we have

μ=ϕ01ϕ1ϕ2\mu=\frac{\phi_0}{1-\phi_1-\phi_2}

And also

γl=ϕ1γl1+ϕ2γl2ρl=ϕ1ρl1+ϕ2ρl2\gamma_l=\phi_1\gamma_{l-1}+\phi_2\gamma_{l-2}\\ \rho_l=\phi_1\rho_{l-1}+\phi_2\rho_{l-2}

Therefore

ρ1=ϕ1ρ0+ϕ2ρ1=ϕ1+ϕ2ρ1\rho_1=\phi_1\rho_{0}+\phi_2\rho_{-1}=\phi_1+\phi_2\rho_1

And

ρl={ϕ11ϕ2, if l=1ϕ1ρl1+ϕ2ρl2, if l2\rho_l = \left\{\begin{array} {l} \frac{\phi_1}{1-\phi_2},\ if\ l=1\\ \phi_1\rho_{l-1}+\phi_2\rho_{l-2},\ if\ l\geq2\\ \end{array}\right.

AR(p)

ρl=ϕ1ρl1+ϕ2ρl2++ϕpρlp\rho_l=\phi_1\rho_{l-1}+\phi_2\rho_{l-2}+\dots+\phi_p\rho_{l-p}

How to identify AR model

There are two general methods to identify the p of AR model.

  • Partial Autocorrelation Function (ACF)

PACF of a stationary time series is a function of its ACF and is a useful tool for determining the order pp of an AR model.

Consider the following AR models:

rr=ϕ0,1+ϕ1,1rt1+e1t,rr=ϕ0,2+ϕ1,2rt1+ϕ2,2rt2+e1t,rr=ϕ0,3+ϕ1,3rt1+ϕ2,3rt2+ϕ3,3rt3+e1t,\begin{aligned} r_r&=\phi_{0,1}+\phi_{1,1}r_{t-1}+e_{1t},\\ r_r&=\phi_{0,2}+\phi_{1,2}r_{t-1}+\phi_{2,2}r_{t-2}+e_{1t},\\ r_r&=\phi_{0,3}+\phi_{1,3}r_{t-1}+\phi_{2,3}r_{t-2}+\phi_{3,3}r_{t-3}+e_{1t},\\ &\vdots \end{aligned}

These models are in the form of a multi-dimension linear regression and can be estimated by the least-squares method. The estimated ϕ^1,1\hat{\phi}_{1,1} is called the lag-1 sample PACF of rtr_t and ϕ^2,2\hat{\phi}_{2,2} is the lag-2 sample PACF and so on.

From the definition, the lag-p sample PACF shows the added contribution of rtpr_{t-p} to an AR(p-1) model. Therefore, for an AR(p) model, the lag-p sample PACF should not be zero and the latter ones should be close to zero.

  • Information Criteria

There are several information based criteria available to determine the p. All of them are likelihood based, like Akaike information criterion(AIC).

Goodness of Fit

R2=1residual sum of squarestotal sum of squaresR^2=1-\frac{residual\ sum\ of\ squares}{total\ sum\ of\ squares}

For a stationary AR(p) model, the measure becomes

R2=1t=p+1Ta^t2t=p+1T(rtrˉt)2R^2=1-\frac{\sum_{t=p+1}^T\hat{a}_t^2}{\sum_{t=p+1}^T(r_t-\bar{r}_t)^2}

For a given data set, it is well known that R2R^2 is a nondecreasing function of the number of parameters used. To overcome this weakness, we could use the adjusted R2R^2:

Radj2=1σ^a2σ^r2R_{adj}^2=1-\frac{\hat{\sigma}_a^2}{\hat{\sigma}_r^2}

MA

Think of a special case of AR

rt+θ1rt1+θ12rt2+=ϕ0+atr_t+\theta_1r_{t-1}+\theta_1^2r_{t-2}+\dots=\phi_0+a_t

And since

rt1+θ1rt2+θ12rt3+=ϕ0+at1r_{t-1}+\theta_1r_{t-2}+\theta_1^2r_{t-3}+\dots=\phi_0+a_{t-1}

We have

rt=ϕ0(1θ1)+atθ1at1r_t=\phi_0(1-\theta_1)+a_t-\theta_1a_{t-1}

Therefore, MA(1) is:

rt=c0+atθ1at1r_t=c_0+a_t-\theta_1a_{t-1}

And MA(q) is

rt=c0+atθ1at1...θqatqr_t=c_0+a_t-\theta_1a_{t-1}-...-\theta_qa_{t-q}

Properties

Moving-average models are always weakly stationary because they are finite linear combination of a white noise sequence.

μ=c0,Var(rt)=(1+θ12+θ22++θ12)σa2\mu=c_0, Var(r_t)=(1+\theta_1^2+\theta_2^2+\dots+\theta_1^2)\sigma_a^2

ACF

For simplicity assume c0=0c_0=0

rtlrt=rtlatθ1rtlat1r_{t-l}r_t=r_{t-l}a_t-\theta_1r_{t-l}a_{t-1}

Therefore

γl={θ1σa2, l=10, l>1\gamma_l = \left\{\begin{array} {l} -\theta_1\sigma_a^2,\ l=1\\ 0,\ l>1\\ \end{array}\right.

And

ρl={1, l=0θ11+θ22, l=10, l>1\rho_l = \left\{\begin{array} {l} 1,\ l=0\\ -\frac{\theta_1}{1+\theta_2^2},\ l=1\\ 0,\ l>1\\ \end{array}\right.

And for MA(q), we could have only the lag-q is not 0 but above are 0. MA(q) is only linearly related to its first 1-lagged values and hence is a "finite memory" model.

How to identify MA model

We could just use the property of ACF to identity qq for MA.

ARMA

An ARMA model combines the idea of AR and MA into a compact form so that the number of parameters used is kept small. For the return series in finance the chance of using ARMA is low. However, it is highly relevant in volatility modeling. The simple ARMA(1, 1) is

rtϕ1rt1=ϕ0+atθ1at1r_t-\phi_1r_{t-1}=\phi_0+a_t-\theta_1a_{t-1}

to make the function meaningful, we need ϕ1θ1\phi_1\neq\theta_1

And the general form is

rtϕ1rt1ϕprtp=ϕ0+atθ1at1θqatqr_t-\phi_1r_{t-1}-\dots-\phi_pr_{t-p}=\phi_0+a_t-\theta_1a_{t-1}-\dots-\theta_qa_{t-q}

Properties

Here we only consider the properties for ARMA(1, 1)

From stationary, we have

μϕ1μ=ϕ0\mu-\phi_1\mu=\phi_0\\

And assume ϕ0=0\phi_0=0 for simplicity, we have

Var(rt)=ϕ12Var(rt12)+σa2+θ12σa22ϕ1θ1E(rt1at1)=ϕ12Var(rt12)+(12ϕ1θ1+θ12)σa2\begin{aligned} Var(r_t)&=\phi_1^2Var(r_{t-1}^2)+\sigma_a^2+\theta_1^2\sigma_a^2-2\phi_1\theta_1E(r_{t-1}a_{t-1})\\ &=\phi_1^2Var(r_{t-1}^2)+(1-2\phi_1\theta_1+\theta_1^2)\sigma_a^2 \end{aligned}

And for ACF,

rtrtlϕ1rt1rtl=atrtlθ1at1rtlr_tr_{t-l}-\phi_1r_{t-1}r_{t-l}=a_tr_{t-l}-\theta_1a_{t-1}r_{t-l}

We have

γl={ϕ1γ0θ1σa2, l=1ϕ1γl1, l>1\gamma_l = \left\{\begin{array} {l} \phi_1\gamma_0-\theta_1\sigma_a^2,\ l=1\\ \phi_1\gamma_{l-1},\ l>1\\ \end{array}\right.

And

ρl={ϕ1ρ0θ1σa2γ0, l=1ϕ1ρl1, l>1\rho_l = \left\{\begin{array} {l} \phi_1\rho_0-\frac{\theta_1\sigma_a^2}{\gamma_0},\ l=1\\ \phi_1\rho_{l-1},\ l>1\\ \end{array}\right.

Thus, ACF of ARMA(1, 1) behaves very much like that of AR(1).

How to identify MA model

The ACF and PACF are not informative in determining the order of an ARMA model. There is something called extended autocorrelation function (EACF) to specify the order of an ARMA process.

Seasonal model

For seasonal data, there is often strong serial correlation. It is common to do seasonal differencing to it, which is

Δkxt=(1Bk)xt=xtxtk\Delta_k x_t = (1-B^k)x_t=x_t-x_{t-k}

And some time, we need to do multiple differencing, which leads to

Δk(Δlxt)=(1Bk)(1Bl)xt\Delta_k(\Delta_lx_t)=(1-B^k)(1-B^l)x_t

ARIMA

if

Zt=(1B)dXtARMA(p,q)Z_t=(1-B)^dX_t\sim ARMA(p, q)

then XtX_t is ARIMA(p, d, q).

References

  1. Tsay, Ruey S. Analysis of financial time series. Vol. 543. John Wiley & Sons, 2005.