• Editorial Board +
• For Contributors +
• Journal Search +
Journal Search Engine
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.14 No.3 pp.312-317
DOI : https://doi.org/10.7232/iems.2015.14.3.312

# Effect of Dimension Reduction on Prediction Performance of Multivariate Nonlinear Time Series

Jun-Yong Jeong, Jun-Seong Kim, Chi-Hyuck Jun*
Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang, Korea
Corresponding Author, chjun@postech.ac.kr
September 4, 2015 September 18, 2015 September 18, 2015

## ABSTRACT

The dynamic system approach in time series has been used in many real problems. Based on Taken’s embedding theorem, we can build the predictive function where input is the time delay coordinates vector which consists of the lagged values of the observed series and output is the future values of the observed series. Although the time delay coordinates vector from multivariate time series brings more information than the one from univariate time series, it can exhibit statistical redundancy which disturbs the performance of the prediction function. We apply dimension reduction techniques to solve this problem and analyze the effect of this approach for prediction. Our experiment uses delayed Lorenz series; least squares support vector regression approximates the predictive function. The result shows that linearly preserving projection improves the prediction performance.

## 1.INTRODUCTION

A traditional time series approach such as ARIMA models interprets the underlying model of series with a linear equation. But, many time series can be generated from a nonlinear relationship. Thus in prediction of nonlinear time series, it is natural to apply the dynamical system approach which is intrinsically nonlinear. The dynamic system approach in time series is widely encountered in natural systems and social systems. It assumes that time series is observed from the state vector in the dynamical system. The system is known to be very sensitive to an initial condition. But in the long term, its behavior is constrained to a fractal finite region because of its invariant topological property. Since Lorenz series was discovered, this approach has been applied to many areas including meteorology (Harding et al., 1990; Lorenz, 1995), medicine (Adeli et al., 2008), economics (Das and Das, 2007), signal processing (Kennel et al., 1992), traffic flow (Shang et al., 2005), climate (Dhanya and Kumar, 2011), biology (Mackey and Glass, 1977) and so on. For example, Lorenz series is the popular time series in a physics system, especially for explaining climate (Lorenz, 1963); it is the simplified version of the Navier Stokes equations and related to various systems like Raleigh-Bernard problem and other weather problems (Dudul, 2005).

Taken’s embedding theorem (Takens, 1981) establishes the theoretical background for prediction in the dynamical system approach. By this theorem, one can build the nonlinear predictive function from a time delay coordinates vector to the future value of the observed series. Although the theorem is for univariate time series, expansion to multivariate time series is natural and showed the better prediction performance (Barnard et al., 2001; Cao et al., 1998).

For approximating the predictive function, several machine learning models are frequently used: Neural Network (Gholipour et al., 2006), Neuro-fuzzy Network (Singh and Borah, 2013), Support Vector Regression (Mukherjee et al., 1997), and Least Squares Support Vector Regression (LSSVR) (Mei-Ying and Xiao-Dong, 2004). The time delay coordinates vector from multivariate time series, however, can exhibit statistical redundancy which disturbs the performance of a machine learning model (Barnard et al., 2001; Han and Wang, 2009). Several studies apply dimension reduction techniques such as Independent Component Analysis (Barnard et al., 2001) and Principal Component Analysis (Han and Wang, 2009) to solve this problem. But there is no extensive comparison about the effect of dimension reduction.

Thus, the goal of this paper is to analyze the effect of dimension reduction on prediction of multivariate nonlinear time series; we select LSSVR to approximate the predictive function because of structural risk minimization and computational efficiency. To achieve this, the paper is organized as follows. In Section 2, basic theories about the dynamic system approach in time series are introduced. In Section 3, short descriptions about dimension reduction methods are shown. In Section 4, methodologies for our experiment are described and results from the experiment are given. A conclusion is given in Section 5.

## 2.DYNAMIC SYSTEM APPROACH IN TIME SERIES

### 2.1.State Space Reconstruction

A dynamic system consists of a state space S, a set of time T, and an evolution rule F : S ×T → S. It is the mathematical concept in which the evolution rule describes how the state vector s(t)∈S evolves over time. From the state vector s(t), we have M dimensional observed series {x1(k), x2(k), ..., xM(k)}Nk = 1 as follows:

$x i k = h i s kt s , i = 1 , ..., M ; k = 1 , ..., N$
(1)

where ts is the sampling rate, hi ( ⋅ ) is the observation function, and N is the length of the observed series

Our problem is to predict the future values of the observed series {x1(k), x2(k), ..., xM(k)}Nk = 1 without the information of the original state space. For a univariate time series {xM(k)}Nk = 1 we start by making the time delay coordinates vector that consists of the lagged values of the observed series:

$X i k = x i k , x i k − τ i , ..., x i k − m i − 1 τ i , k = m i − 1 τ i + 1 , ..., N$
(2)

where τi is the time delay and mi is the embedding dimension. By Taken’s embedding theorem (Takens, 1981), the time delay coordinates vector can reconstruct a manifold topologically equivalent to the unknown original manifold in the state space. Under some regularity conditions, for almost τi and for some mi ≥ 2 [D] +1(D is the dimension of the original manifold in S), there exists a predictive function fi : Rmi → Rmi such that

$X i k + 1 = f i X i k$
(3)

Expansion of the theorem to multivariate time series {x1(k), x2(k), ..., xM(k)}Nk = 1 is similar (Cao et al., 1998); we make the time delay coordinates vector from the multivariate time series as follows:

$V k = x 1 k , x 1 k − τ 1 , ..., x 1 k − m 1 − 1 τ 1 ; x 2 k , x 2 k − τ 2 , ..., x 2 k − m 2 − 1 τ 2 ; .......; x M k , x M k − τ M , ..., x M k − m M − 1 τ M ;$
(4)

$k = J 0 , J 0 + 1 , ..., N ; J 0 = max 1 ≤ i ≤ M m i − 1 τ i + 1$

where mi is the embedding dimensions and τi is the time delay. If mi or Σmi is sufficiently large, there exists a predictive function g : Rm → Rm (m = Σ Mi = 1 mi) such that

$V k + 1 = g V k$
(5)

Equivalently, there exists a predictive function : gi : Rm → R such that

$x i k + 1 = g i V k$
(6)

The state space reconstruction from multivariate series shows the better prediction performance than those from univariate series (Barnard et al., 2001; Cao et al., 1998).

### 2.2.Parameter Selection

As we build the time delay coordinates vector, essential task is selecting the time delay τi and embedding dimension . mi The time delay τi is calculated separately for each univariate time series with mutual information method (Fraser and Swinney, 1986). Mutual information measures the dependency between xi (k) and xi (k + τ) through a histogram. We select τi which shows the first minimum of the mutual information.

We apply False Nearest Neighbor (FNN) method, to compute the embedding dimension mi. FNN for univariate time series arises from the topological equivalence between the state space and embedding space which is the space of the time delay coordinates vector (Kennel et al., 1992). For sufficiently large mi , the nea rest neighbor of a point on the embedding space with dimension mi is also close to the point on those with dimension mi +1. If the distance between these points becomes large on the higher dimension embedding space, the nearest neighbor is called false nearest neighbor; the time delay coordinates vector with embedding dimension mi fails to preserve the topological property. FNN tries to find mi that minimizes the ratio of false nearest neighbor. Multivariate version FNN is similar to the above except that it minimizes the average ratio of false nearest neighbor by increasing mi by one in turn (Su, 2010). We start from m1 = m2 =...= mM =1 and increase some mi until this average goes to zero.

## 3.DIMENSION REDUCTION

### 3.1.Dimension Reduction Techniques

Suppose that a (N − J0 +1)×m matrix V represents a dataset. Assume that this dataset has an intrinsic dimension d(d < m); the vectors V(k)∈ Rm, k = J0, J0 +1, ..., N are lying near the manifold that has dimension d and embedded in the m-dimensional space. Dimension reduction techniques find the mapping from the matrix V into a new (N − J0 +1)× d matrix Z, while preserving some properties of the dataset (Van der Maaten, 2007). It is divided into four categories as shown in Table 1; linear techniques, global nonlinear techniques, local nonlinear techniques and variants of local nonlinear techniques. Table 1 shows the list of dimension reduction techniques which we applied in our experiment.

### 3.2.Linear Techniques

Linear techniques are based on a linear mapping. Principal component analysis (PCA) aims to construct the low-dimensional representation of the dataset that describes as much of its variance as possible (Hotelling, 1933). Multidimensional scaling (MDS) retains the pairwise distance as possible (Torgerson, 1952).

### 3.3.Global Nonlinear Techniques

Global nonlinear techniques construct the low-dimensional space which retains a global nonlinear property in the dataset. Isomap preserves the pairwise geodesic distances between the data points; the geodesic distance between two points is measured over the manifold (Tenenbaum et al., 2000).

### 3.4.Local Nonlinear Techniques

Local nonlinear techniques aim to build the mapping which preserves a local nonlinear property. Laplacian Eigenmap retains the distances between a highdimensional point and its nearest neighbors; in lowdimensional space, they are weighted by Gaussian kernel function (Belkin and Niyogi, 2002). Local Linear Embedding (LLE) expresses a high-dimensional point as the linear combination of its nearest neighbor (Roweis and Saul, 2000); it tries to preserve the weights in the linear combination. Local Tangent Space Analysis (LTSA) exploits the tangent space in the neighborhood of a highdimensional point; it aligns these local tangent spaces in the low- dimensional space mapped from the original space (Zhang and Zha, 2004).

### 3.5.Variants of Local Nonlinear Techniques

Variants of local nonlinear techniques solve the same problem in local nonlinear techniques, but they are based on a linear transformation to solve an out-of-sample problem. Linearity Preserving Projection (LPP) finds the linear mapping that minimizes the cost function of Laplacian Eigenmaps (He and Niyogi, 2004). Neighborhood Preserving Embedding (NPE) is based on LLE (He et al., 2005). Linear Local Tangent Space Alignment (LLTSA) constructs the linear transformation that minimizes the objective function in LTSA (Zhang et al., 2007).

### 3.6.Correlation Dimension Estimation

Correlation dimension is applied to estimate the intrinsic dimension d. It is based on the fact that the number of points in a hypersphere with radius r is proportional to rd (Grassberger and Procaccia, 2004).

## 4.EXPERIMENT WITH DELAYED LORENZ SERIES

To analyze the effect of dimension reduction on predicting multivariate nonlinear time series, we conducted an experiment with a generated series. Figure 1 shows a brief procedure. Details for the procedure are as follows.

### 4.1Series Generation

We generated a length 4,000 delayed Lorenz series from the delayed differential equations in Eq. (7) (Zhi- Yong et al., 2011). Parameters in Eq. (7) were chosen by a = 16, b = 45.92, c = 4, e1 =1.2, e2 = 0.75, e3 =1 and γ = 3. The series started with s1(t) = s2(t) = s3(t) =1 for t < γ . The differential equations were solved by the explicit Runge-Kutta (2, 3) method where the step size of integral h is 0.01. The observed series were xi (k) = hi (s (kts )) = si (kts ), i = 1, 2, 3; k = 1, …, 4,000 where the sampling rate ts is 0.3. First 1,000 observations were burnt to reduce the effect of the initial points. From the remaining 3,000 observations, we considered the next 2,000 observations as a training set and the remainder 1,000 observations as a test set for an evaluation. Figure 2 shows the three series for the training and the test sets.

$ds 1 dt = − a s 1 t − s 2 t − e 1 s 3 t − γ ds 1 dt = b − s 3 s 1 − s 2 − e 2 s 2 t − γ ds 1 dt = s 1 t s 2 t − c s 3 t − τ − e 3 s 3 t − γ$
(7)

### 4.2.State Space Reconstruction

To build the time delay coordinates vector from multivariate time series, we used mutual information method and multivariate version FNN to estimate the time delay τi and embedding dimension mi , respectively. For {x1(k), x2(k), x3(k)}3000k = 1001, we obtained the time delay τ1 = τ2 = τ3 = 1 and embedding dimension m1 = 6, m2 = m3 = 1.

For comparison purpose, we also built the time delay coordinates vector from univariate time series; the only difference from the above is that we used univariate version FNN.

$X 1 k = x 1 k , x 1 k − 1 , ..., x 1 k − 4$

$X 2 k = x 2 k , x 2 k − 1 , ..., x 2 k − 5$

$X 3 k = x 3 k , x 3 k − 1 , ..., x 3 k − 4$

### 4.3.Dimension Reduction

After the correlation dimension of V was estimated, we transformed V to Z through the dimension reduction techniques in Section 3. For local nonlinear techniques and variants of local nonlinear techniques, the number of nearest neighbor was set to 12 by trial and error.

$V x ∈ R 7 → Z k ∈ R 4$

For each series, we learned the multivariate model with Z(k) for one-step ahead prediction by LSSVR.

Simplex methods with the initial points from simulated annealing optimized the parameters for LSSVR. The univariate model with Xi(k) and multivariate model with V(k) without dimension reduction for one-step ahead prediction were also learned similarly for model comparison. The performance of the model was estimated through Root Mean Square Error (RMSE) on the test set.

$RMSM = ∑ k = 3001 4000 x ˆ i k + 1 − x i k + 1 2 1000$
(8)

### 4.5.Results

Table 2 summarizes the results of the experiment. The several models with the different input such as Xi(k), V(k), or Z(k) yielded the different results. The column named ‘Univariate’ represents the results from the univariate model with Xi(k) and the column named ‘Without dimension reduction’ represents the results from the multivariate model with V(k) without dimension reduction. Similarly, the column named ‘PCA’ represents the results from the multivariate model with Z(k) transformed from V(k) through PCA.

Analogous to the previous studies (Barnard et al., 2001; Cao et al., 1998), the multivariate model with V(k) without dimension reduction showed smaller RMSE than the univariate model with Xi(k). Especially, the decrease in RMSE between them was largest for x2(k) by $15.14 − 12.84 15.14 × 100 = 15.19 %$ . For x1(k) and x3(k), RMSE decreased 9.82% and 1.84%, respectively.

Most of the multivariate models with Z(k), which is transformed from V(k) through dimension reduction, failed to improve the prediction performance of the multivariate model with V(k) without dimension reduction; among them, the models that use nonlinear techniques showed the worst prediction performance. LPP marginally reduced RMSE for all series; it reduced RMSE by less than 1%.

## 5.CONCLUSION

The effect of dimension reduction on predictability of multivariate nonlinear time series is analyzed in this paper. The time delay coordinates vector from multivariate time series can cause statistical redundancy which disturbs the ability of a machine learning model. Thus, we apply various dimension reduction techniques to solve it. From the experiment with delayed Lorenz series, variants of local nonlinear techniques could improve prediction performance of the multivariate model.

Our future work should be to extend the work to higher dimensional series in which dimension reduction could show a better result.

## Figure

Procedure of the experiment.

Training and test sets.

## Table

Dimension reduction techniques

Results of experiments

## REFERENCES

1. Adeli H , Ghosh-Dastidar S , Dadmehr N (2008) A spatio-temporal wavelet-chaos methodology for EEG-based diagnosis of Alzheimer’s disease , Neuroscience Letters, Vol.444 (2) ; pp.190-194
2. Barnard J P , Aldrich C , Gerber M (2001) Embedding of multidimensional time-dependent observations , Physical Review E, Vol.64 (4) ; pp.046201
3. Belkin M , Niyogi P Dietterich T G , Dietterich T G , Becker S , Ghahramani Z (2002) Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , Advance in Neural Information Processing Systems, MIT Press, pp.585-591
4. Cao L , Mees A , Judd K (1998) Dynamics from multivariate time series , Physica D: Nonlinear Phenomena, Vol.121 (1) ; pp.75-88
5. Chen D , Han W (2013) Prediction of multivariate chaotic time series via radial basis function neural network , Complexity, Vol.18 (4) ; pp.55-66
6. Das A , Das P (2007) Chaotic analysis of the foreign exchange rates , Applied Mathematics and Compu tation, Vol.185 (1) ; pp.388-396
7. Dhanya C , Kumar D N (2011) Multivariate nonlinear ensemble prediction of daily chaotic rainfall with climate inputs , Journal of Hydrology, Vol.403 (3) ; pp.292-306
8. Dudul S V (2005) Prediction of a Lorenz chaotic attractor using two-layer perceptron neural network , Applied Soft Computing, Vol.5 (4) ; pp.333-355
9. Fraser A M , Swinney H L (1986) Independent coordinates for strange attractors from mutual information , Physical Review A, Vol.33 (2) ; pp.1134
10. Gholipour A , Araabi B N , Lucas C (2006) Predicting chaotic time series using neural and neurofuzzy models: a comparative study , Neural Processing Letters, Vol.24 (3) ; pp.217-239
11. Grassberger P , Procaccia I Hunt B R , Kennedy J , Li T-Y , Nusse H E (2004) Measuring the strangeness of strange attractors , The Theory of Chaotic Attractors, Springer, pp.170-189
12. Han M , Wang Y (2009) Analysis and modeling of multivariate chaotic time series based on neural network , Expert Systems with Applications, Vol.36 (2) ; pp.1280-1290
13. Harding A K , Shinbrot T , Cordes J M (1990) A chaotic attractor in timing noise from the VELA pulsar? , The Astrophysical Journal, Vol.353; pp.588-596
14. He X , Niyogi P Thrun S , Saul L K , Schölkopf B (2004) Locality preserving projections , Advances in Neural Information Processing Systems 16, The MIT Press, pp.153-160
15. He X , Cai D , Yan S , Zhang H-J (2005) Neighborhood preserving embedding , Proceedings of the Tenth IEEE International Conference on the Computer Vision (ICCV), Vol.2; pp.1208-1213
16. Hotelling H (1933) Analysis of a complex of statistical variables into principal components , Journal of Educational Psychology , Vol.24 (6) ; pp.417- 441
17. Kennel M B , Brown R , Abarbanel H D (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction , Physical Review A, Vol.45 (6) ; pp.3403-3411
18. Lorenz E N (1963) Deterministic nonperiodic flow , Journal of the Atmospheric Sciences, Vol.20 (2) ; pp.130-141
19. Lorenz E N (1995) The essence of chaos , University of Washington Press,
20. Mackey M C , Glass L (1977) Oscillation and chaos in physiological control systems , Science, Vol.197 (4300) ; pp.287-289
21. Mei-Ying Y , Xiao-Dong W (2004) Chaotic time series prediction using least squares support vector machines , Chinese Physics, Vol.13 (4) ; pp.454-458
22. Monahan A H (2000) Nonlinear principal component analysis by neural networks: Theory and application to the Lorenz system , Journal of Climate, Vol.13 (4) ; pp.821-835
23. Mukherjee S , Osuna E , Girosi F (1997) Nonlinear prediction of chaotic time series using support vector machines , Proceedings of the 1997 IEEE Workshop of the Neural Networks for Signal Processing, pp.511-520
24. Roweis S T , Saul L K (2000) Nonlinear dimensionality reduction by locally linear embedding , Science, Vol.290 (5500) ; pp.2323-2326
25. Shang P , Li X , Kamae S (2005) Chaotic analysis of traffic time series, Chaos , Solitons and Fractals, Vol.25 (1) ; pp.121-128
26. Su L-y (2010) Prediction of multivariate chaotic time series with local polynomial fitting , Computers and Mathematics with Applications, Vol.59 (2) ; pp.737-744
27. Suykens J A , De Brabanter J , Lukas L , Vandewalle J (2002) Weighted least squares support vector machines: robustness and sparse approximation , Neurocomputing, Vol.48 (1) ; pp.85-105
28. Takens F Rand D A , Young L S (1981) Detecting strange attractors in turbulence , Dynamical Systems and Turbulence Warwick 1980 , Springer, pp.366-381
29. Tenenbaum J B , De Silva V , Langford J C (2000) A global geometric framework for nonlinear dimensionality reduction , Science, Vol.290 (5500) ; pp.2319-2323
30. Torgerson W S (1952) Multidimensional scaling: I Theory and method , Psychometrika, Vol.17 (4) ; pp.401-419
31. Van der Maaten L (2007) An introduction to dimensionality reduction using matlab , Report, Vol.1201 (07-07) ; pp.62
32. Vapnik V (2013) The nature of statistical learning theory , Springer Science and Business Media,
33. Zhang T , Yang J , Zhao D , Ge X (2007) Linear local tangent space alignment and application to face recognition , Neurocomputing, Vol.70 (7) ; pp.1547-1553
34. Zhang Z-Y , Zha H-Y (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , Journal of Shanghai University (English Edition), Vol.8 (4) ; pp.406-424
35. Zhi-Yong Y , Guang Y , Cun-Bing D (2011) Timedelay feedback control in a delayed dynamical chaos system and its applications , Chinese Physics B, Vol.20 (1) ; pp.010207
 Do not open for a day Close