## 1.INTRODUCTION

A traditional time series approach such as ARIMA models interprets the underlying model of series with a linear equation. But, many time series can be generated from a nonlinear relationship. Thus in prediction of nonlinear time series, it is natural to apply the dynamical system approach which is intrinsically nonlinear. The dynamic system approach in time series is widely encountered in natural systems and social systems. It assumes that time series is observed from the state vector in the dynamical system. The system is known to be very sensitive to an initial condition. But in the long term, its behavior is constrained to a fractal finite region because of its invariant topological property. Since Lorenz series was discovered, this approach has been applied to many areas including meteorology (Harding *et al*., 1990; Lorenz, 1995), medicine (Adeli *et al*., 2008), economics (Das and Das, 2007), signal processing (Kennel *et al*., 1992), traffic flow (Shang *et al*., 2005), climate (Dhanya and Kumar, 2011), biology (Mackey and Glass, 1977) and so on. For example, Lorenz series is the popular time series in a physics system, especially for explaining climate (Lorenz, 1963); it is the simplified version of the Navier Stokes equations and related to various systems like Raleigh-Bernard problem and other weather problems (Dudul, 2005).

Taken’s embedding theorem (Takens, 1981) establishes the theoretical background for prediction in the dynamical system approach. By this theorem, one can build the nonlinear predictive function from a time delay coordinates vector to the future value of the observed series. Although the theorem is for univariate time series, expansion to multivariate time series is natural and showed the better prediction performance (Barnard *et al*., 2001; Cao *et al*., 1998).

For approximating the predictive function, several machine learning models are frequently used: Neural Network (Gholipour *et al*., 2006), Neuro-fuzzy Network (Singh and Borah, 2013), Support Vector Regression (Mukherjee *et al*., 1997), and Least Squares Support Vector Regression (LSSVR) (Mei-Ying and Xiao-Dong, 2004). The time delay coordinates vector from multivariate time series, however, can exhibit statistical redundancy which disturbs the performance of a machine learning model (Barnard *et al*., 2001; Han and Wang, 2009). Several studies apply dimension reduction techniques such as Independent Component Analysis (Barnard *et al*., 2001) and Principal Component Analysis (Han and Wang, 2009) to solve this problem. But there is no extensive comparison about the effect of dimension reduction.

Thus, the goal of this paper is to analyze the effect of dimension reduction on prediction of multivariate nonlinear time series; we select LSSVR to approximate the predictive function because of structural risk minimization and computational efficiency. To achieve this, the paper is organized as follows. In Section 2, basic theories about the dynamic system approach in time series are introduced. In Section 3, short descriptions about dimension reduction methods are shown. In Section 4, methodologies for our experiment are described and results from the experiment are given. A conclusion is given in Section 5.

## 2.DYNAMIC SYSTEM APPROACH IN TIME SERIES

### 2.1.State Space Reconstruction

A dynamic system consists of a state space * S*, a set of time

*, and an evolution rule*

**T***. It is the mathematical concept in which the evolution rule describes how the state vector*

**F : S ×T → S***evolves over time. From the state vector*

**s(t)∈S***, we have*

**s(t)***M*dimensional observed series

*{x*as follows:

_{1}(k), x_{2}(k), ..., x_{M}(k)}^{N}_{k = 1}where *t _{s}* is the sampling rate,

*h*is the observation function, and

_{i}( ⋅ )*N*is the length of the observed series

Our problem is to predict the future values of the observed series *{x _{1}(k), x_{2}(k), ..., x_{M}(k)}^{N}_{k = 1}* without the information of the original state space. For a univariate time series

*{x*we start by making the time delay coordinates vector that consists of the lagged values of the observed series:

_{M}(k)}^{N}_{k = 1}where *τ _{i}* is the time delay and

*m*is the embedding dimension. By Taken’s embedding theorem (Takens, 1981), the time delay coordinates vector can reconstruct a manifold topologically equivalent to the unknown original manifold in the state space. Under some regularity conditions, for almost

_{i}*τ*and for some

_{i}*m*is the dimension of the original manifold in

_{i}≥ 2 [D] +1(D*), there exists a predictive function*

**S***f*such that

_{i}: R^{mi}→ R^{mi}Expansion of the theorem to multivariate time series *{x _{1}(k), x_{2}(k), ..., x_{M}(k)}^{N}_{k = 1}* is similar (Cao

*et al*., 1998); we make the time delay coordinates vector from the multivariate time series as follows:

$k={J}_{0},{J}_{0}+1,...,N;{J}_{0}=\underset{1\le i\le M}{\mathit{max}}\left\{\left({m}_{i}-1\right){\mathrm{\tau}}_{i}+1\right\}$

where *m _{i}* is the embedding dimensions and

*τ*is the time delay. If

_{i}*m*or

_{i}*Σm*is sufficiently large, there exists a predictive function

_{i}*g : R*such that

^{m}→ R^{m}(m = Σ^{M}_{i = 1}m_{i})Equivalently, there exists a predictive function : *g _{i} : R^{m} → R* such that

The state space reconstruction from multivariate series shows the better prediction performance than those from univariate series (Barnard *et al*., 2001; Cao *et al*., 1998).

### 2.2.Parameter Selection

As we build the time delay coordinates vector, essential task is selecting the time delay *τ _{i}* and embedding dimension .

*m*The time delay

_{i}*τ*is calculated separately for each univariate time series with mutual information method (Fraser and Swinney, 1986). Mutual information measures the dependency between

_{i}*x*and

_{i}(k)*x*through a histogram. We select

_{i}(k + τ)*τ*which shows the first minimum of the mutual information.

_{i}We apply False Nearest Neighbor (FNN) method, to compute the embedding dimension *m _{i}*. FNN for univariate time series arises from the topological equivalence between the state space and embedding space which is the space of the time delay coordinates vector (Kennel

*et al*., 1992). For sufficiently large

*m*, the nea rest neighbor of a point on the embedding space with dimension

_{i}*m*is also close to the point on those with dimension

_{i}*m*+1. If the distance between these points becomes large on the higher dimension embedding space, the nearest neighbor is called false nearest neighbor; the time delay coordinates vector with embedding dimension

_{i}*m*fails to preserve the topological property. FNN tries to find

_{i}*m*that minimizes the ratio of false nearest neighbor. Multivariate version FNN is similar to the above except that it minimizes the average ratio of false nearest neighbor by increasing

_{i}*m*by one in turn (Su, 2010). We start from

_{i}*m*=

_{1}*m*=...=

_{2}*m*=1 and increase some

_{M}*m*until this average goes to zero.

_{i}## 3.DIMENSION REDUCTION

### 3.1.Dimension Reduction Techniques

Suppose that a *(N − J _{0} +1)×m* matrix

**V**represents a dataset. Assume that this dataset has an intrinsic dimension

*d(d < m)*; the vectors

**V**

*(k)∈ R*are lying near the manifold that has dimension d and embedded in the m-dimensional space. Dimension reduction techniques find the mapping from the matrix

^{m}, k = J_{0}, J_{0}+1, ..., N**V**into a new (

*N − J*matrix

_{0}+1)× d**Z**, while preserving some properties of the dataset (Van der Maaten, 2007). It is divided into four categories as shown in Table 1; linear techniques, global nonlinear techniques, local nonlinear techniques and variants of local nonlinear techniques. Table 1 shows the list of dimension reduction techniques which we applied in our experiment.

### 3.2.Linear Techniques

Linear techniques are based on a linear mapping. Principal component analysis (PCA) aims to construct the low-dimensional representation of the dataset that describes as much of its variance as possible (Hotelling, 1933). Multidimensional scaling (MDS) retains the pairwise distance as possible (Torgerson, 1952).

### 3.3.Global Nonlinear Techniques

Global nonlinear techniques construct the low-dimensional space which retains a global nonlinear property in the dataset. Isomap preserves the pairwise geodesic distances between the data points; the geodesic distance between two points is measured over the manifold (Tenenbaum *et al*., 2000).

### 3.4.Local Nonlinear Techniques

Local nonlinear techniques aim to build the mapping which preserves a local nonlinear property. Laplacian Eigenmap retains the distances between a highdimensional point and its nearest neighbors; in lowdimensional space, they are weighted by Gaussian kernel function (Belkin and Niyogi, 2002). Local Linear Embedding (LLE) expresses a high-dimensional point as the linear combination of its nearest neighbor (Roweis and Saul, 2000); it tries to preserve the weights in the linear combination. Local Tangent Space Analysis (LTSA) exploits the tangent space in the neighborhood of a highdimensional point; it aligns these local tangent spaces in the low- dimensional space mapped from the original space (Zhang and Zha, 2004).

### 3.5.Variants of Local Nonlinear Techniques

Variants of local nonlinear techniques solve the same problem in local nonlinear techniques, but they are based on a linear transformation to solve an out-of-sample problem. Linearity Preserving Projection (LPP) finds the linear mapping that minimizes the cost function of Laplacian Eigenmaps (He and Niyogi, 2004). Neighborhood Preserving Embedding (NPE) is based on LLE (He *et al*., 2005). Linear Local Tangent Space Alignment (LLTSA) constructs the linear transformation that minimizes the objective function in LTSA (Zhang *et al*., 2007).

### 3.6.Correlation Dimension Estimation

Correlation dimension is applied to estimate the intrinsic dimension *d*. It is based on the fact that the number of points in a hypersphere with radius *r* is proportional to *r ^{d}* (Grassberger and Procaccia, 2004).

## 4.EXPERIMENT WITH DELAYED LORENZ SERIES

To analyze the effect of dimension reduction on predicting multivariate nonlinear time series, we conducted an experiment with a generated series. Figure 1 shows a brief procedure. Details for the procedure are as follows.

### 4.1Series Generation

We generated a length 4,000 delayed Lorenz series from the delayed differential equations in Eq. (7) (Zhi- Yong *et al*., 2011). Parameters in Eq. (7) were chosen by *a* = 16, *b* = 45.92, *c* = 4, *e _{1}* =1.2,

*e*= 0.75,

_{2}*e*=1 and

_{3}*γ*= 3. The series started with

*s*=

_{1}(t)*s*=

_{2}(t)*s*=1 for

_{3}(t)*t*<

*γ*. The differential equations were solved by the explicit Runge-Kutta (2, 3) method where the step size of integral

*h*is 0.01. The observed series were

*x*=

_{i}(k)*h*=

_{i}(s (kt_{s}))*s*where the sampling rate

_{i}(kt_{s}), i = 1, 2, 3; k = 1, …, 4,000*t*is 0.3. First 1,000 observations were burnt to reduce the effect of the initial points. From the remaining 3,000 observations, we considered the next 2,000 observations as a training set and the remainder 1,000 observations as a test set for an evaluation. Figure 2 shows the three series for the training and the test sets.

_{s}### 4.2.State Space Reconstruction

To build the time delay coordinates vector from multivariate time series, we used mutual information method and multivariate version FNN to estimate the time delay *τ _{i}* and embedding dimension

*m*, respectively. For

_{i}*{x*, we obtained the time delay

_{1}(k), x_{2}(k), x_{3}(k)}^{3000}_{k = 1001}*τ*=

_{1}*τ*=

_{2}*τ*= 1 and embedding dimension

_{3}*m*= 6,

_{1}*m*=

_{2}*m*= 1.

_{3}For comparison purpose, we also built the time delay coordinates vector from univariate time series; the only difference from the above is that we used univariate version FNN.

${X}_{1}\left(k\right)=\left({x}_{1}\left(k\right),{x}_{1}\left(k-1\right),...,{x}_{1}\left(k-4\right)\right)$

${X}_{2}\left(k\right)=\left({x}_{2}\left(k\right),{x}_{2}\left(k-1\right),...,{x}_{2}\left(k-5\right)\right)$

${X}_{3}\left(k\right)=\left({x}_{3}\left(k\right),{x}_{3}\left(k-1\right),...,{x}_{3}\left(k-4\right)\right)$

### 4.3.Dimension Reduction

After the correlation dimension of **V** was estimated, we transformed **V** to **Z** through the dimension reduction techniques in Section 3. For local nonlinear techniques and variants of local nonlinear techniques, the number of nearest neighbor was set to 12 by trial and error.

$V\left(x\right)\in {R}^{7}\to Z\left(k\right)\in {R}^{4}$

### 4.4.Evaluation: One-Step-Ahead Prediction

For each series, we learned the multivariate model with **Z***(k)* for one-step ahead prediction by LSSVR.

Simplex methods with the initial points from simulated annealing optimized the parameters for LSSVR. The univariate model with *X _{i}(k)* and multivariate model with

**V**

*(k)*without dimension reduction for one-step ahead prediction were also learned similarly for model comparison. The performance of the model was estimated through Root Mean Square Error (RMSE) on the test set.

### 4.5.Results

Table 2 summarizes the results of the experiment. The several models with the different input such as *X _{i}(k)*,

**V**

*(k)*, or

**Z**

*(k)*yielded the different results. The column named ‘Univariate’ represents the results from the univariate model with

*X*and the column named ‘Without dimension reduction’ represents the results from the multivariate model with

_{i}(k)**V**

*(k)*without dimension reduction. Similarly, the column named ‘PCA’ represents the results from the multivariate model with

**Z**

*(k)*transformed from

**V**

*(k)*through PCA.

Analogous to the previous studies (Barnard *et al*., 2001; Cao *et al*., 1998), the multivariate model with **V***(k)* without dimension reduction showed smaller RMSE than the univariate model with *X _{i}(k)*. Especially, the decrease in RMSE between them was largest for

*x*by $\frac{15.14-12.84}{15.14}\times 100=15.19\%$ . For

_{2}(k)*x*and

_{1}(k)*x*, RMSE decreased 9.82% and 1.84%, respectively.

_{3}(k)Most of the multivariate models with **Z***(k)*, which is transformed from **V***(k)* through dimension reduction, failed to improve the prediction performance of the multivariate model with **V***(k)* without dimension reduction; among them, the models that use nonlinear techniques showed the worst prediction performance. LPP marginally reduced RMSE for all series; it reduced RMSE by less than 1%.

## 5.CONCLUSION

The effect of dimension reduction on predictability of multivariate nonlinear time series is analyzed in this paper. The time delay coordinates vector from multivariate time series can cause statistical redundancy which disturbs the ability of a machine learning model. Thus, we apply various dimension reduction techniques to solve it. From the experiment with delayed Lorenz series, variants of local nonlinear techniques could improve prediction performance of the multivariate model.

Our future work should be to extend the work to higher dimensional series in which dimension reduction could show a better result.