1. INTRODUCTION
In diverse applications, the advancement of sensor technology can provide opportunities to collect large amounts of functional data to detect outofcontrol when the process is changed. Moreover, in precision manufacturing, new automatic control rules based on detection of changes in means or variances are being developed with complicated functional data. Unlike the linear functional data studied in several applications, this paper focuses on complicated functional data, which have lots of sharp drops, spikes, fluctuations as shown in Figure 1. For example, Ko et al. (2010) and Jeong et al. (2013) used optical emission spectroscopy (OES) signal for the fault detection and classification of plasma etchers. Park et al. (2012) utilized high dimensional nearinfrared (NIR) spectral data to predict biomass chemical composition by using data mining techniques. It is very important in the biobased industry to predict the contents of carbohydrates, lignin, and ash in biomass. However, the direct approaches such as wet chemistry methods has several disadvantages that make it impractical, or even impossible, to be applied to online monitoring in the industry. Therefore, in biobased industry, the analysis of NIR spectral data can save much time and expense, and it is much faster and simpler to predict the chemical information than direct analysis methods. Jin and Shi (1999) used functional data to detect types of stamping faults in automobile sheetmetal stamping process. Their tonnage signal shows that a sharp drop in a signal at a specific location indicates an “excessivesnap,” and a larger signal variation in one region indicates a “worngib” problem. Examples of highdimensional functional data used in applications other than engineering studies include Morris et al. (2003) investigation of a biomarker in early colon carcinogenesis, and De Castro et al. (2005) modeling of functional sulfur dioxide samples for environmental monitoring. In addition, in Nortel’s wireless antenna manufacturing process, the shape of the lobes as shown in Figure 1 needs to follow certain structures. There are specifications for peakheight, lobesize and the difference between the peaks and valleys. See Jeong et al. (2007) for a more detailed description of the process.
However, the nonlinear high dimensional functional data has too many variables to be monitored in which it leads to the results in weak power of detecting outofprocesses. Therefore, the development of data reduction methods has been a significant research topic for a long year and the reducedsize data should represent the characteristics of functional data when the process is outofcontrol. To overcome this problem, wavelet techniques have been widely used in the literature. The popularity of wavelets in recent engineering applications is caused by the availability of a fast algorithm for the discrete wavelet transform (DWT). The computational efficiency of the DWT is better than the other transforms. For example, the principal component analysis (PCA) requires solving an eigenvalue system that is an expensive O(N^{3}) operation in which N is the data size. The Fast Fourier Transform (FFT) requires O(N log N) operations, but a fast wavelet transform only requires O(N) operations. Wavelets also support several data compression and data reduction procedures. For example, Ko et al. (2010) used engineering knowledge to segment data in order to isolate fault types in semiconductor processes. The wavelet coefficients were selected using a procedure of verticalenergythresholding (VET) rule. Both segmentation and thresholding reduce the amount of data for the efficiency of further decision making. In addition, in monitoring and controlling a nanomachining process, Ganesan et al. (2003) pointed out that the speed of analyzing data and making process control decisions must be faster than the material removal speed (100800 nanometers per minute); otherwise useful materials on a ultrathin wafer will be removed. In that example, the DWT was used as a filter to reduce the dimension of data and to handle complicated signals. Then, fault monitoring and variance changepoint detection procedures (Das et al., 2005) were developed to meet the manufacturing requirements. In addition, wavelet based models could effectvely extract a few wavelet coefficients with the smaller data reconstruction error to deal with irregularities such as sharp jumps and drops. Recently, several datareduction procedures have been proposed by wavelets and PCA, and independent component analysis (ICA). Artoni et al. (2018) developed a dimension reduction method to analyze EEG data by combing PCA with ICA.
In addition, for multiple sets of complicated functional data, Lada et al. (2002) developed a model to detect fault patterns in semiconductor manufacturing with multiple functional data collected by sensors. In addition, Paynabar and Jin (2011) developed a waveletbased mixedeffect model to characterize within and betweenprofile (nonlinear) variations. The profiles of pressing force signals obtained from a valve seat assembly procedure were used to illustrate the work. In cancer research, Morris and Carroll (2006) used Bayesian waveletbased functional mixed models to characterize complicated nonlinear functional data.
The main goal of this paper is to employ a method in data reduction to extract representative wavelet variables for multiple sets of complicated functional data. Unlike existing literatures, this paper deals with a multiplecurve wavelet thresholding and data reduction procedure. The case study shows that the proposed wavelet model can reduce data to a smaller scale than standard procedures and also demonstrates that process fault detection tools developed from the reducedsize data can efficiently identify process problems.
This paper is organized as follows: Section 2 briefly reviews the wavelet background and its properies, and Section 3 presents data reduction model based on wavelet trnasforms for multiple curves. The performance comparision with reallife examples are given in Section 4, and the conclusion and suggestions for possible future works are offered in Section 5.
2. WAVELET TRANSFORMS
Denoted by ${y}_{i}={\left[{y}_{i1},{y}_{i2},\cdots ,{y}_{iN}\right]}^{T}$ a vector of N equally spaced data points (or curve) at data curve i obtained from a manufacturing process, where N = 2^{J} with some positive integer J and i = 1, 2, ⋯, M for independently replicated curves. The superscript T represents the transpose operator. Let $\mathbf{\text{Y}}={[{y}_{1}^{T},{y}_{2}^{T},\cdots ,{y}_{M}^{T}]}^{T}$ . When a DWT W is applied to the data Y, the vector of wavelet coefficients obtained from this transformation is D = YW, where $D={[{d}_{1}^{T},\text{\hspace{0.33em}}{d}_{2}^{T},\text{\hspace{0.33em}}\cdots ,\text{\hspace{0.33em}}{d}_{M}^{T}]}^{T},\text{\hspace{0.33em}}{d}_{i}={[{d}_{i1},\text{\hspace{0.33em}}{d}_{i2},\text{\hspace{0.33em}}\cdots ,\text{\hspace{0.33em}}{d}_{iN}]}^{T},$d_{im} is the wavelet coefficient at the mth waveletposition for the i th data curve, and W = [h_{ij}], for i, j = 1, 2, ⋯, N is the orthonormal N × N wavelettransform matrix.
The original observations Y can be reconstructed using the inverse DWT. That is, through Y=DW^{T}, the original data can be expressed as a linear sum of products of wavelet coefficients (c_{L,k} and d_{j,k}) and their corresponding waveletbasis functions (ϕ_{L,k}(t) and ψ_{j,k}(t) as follows:
where the resolution level J is greater than the coarsest level L. To keep the notation simple, all wavelet coefficients, $[{c}_{t,L,0},\text{\hspace{0.33em}}\cdots ,\text{\hspace{0.33em}}{c}_{t,L,{2}^{L}1},\text{\hspace{0.33em}}{d}_{t,L,0},\text{\hspace{0.33em}}\cdots ,\text{\hspace{0.33em}}{d}_{t,J,{2}^{J}1}]$, are represented by a simple ${d}_{t}={[{d}_{t1},\text{\hspace{0.33em}}{d}_{t2},\text{\hspace{0.33em}}\cdots ,\text{\hspace{0.33em}}{d}_{tN}]}^{T}$ vector. Note that if there are N data points, there will be N wavelet coefficients. With these N wavelet coefficients, the original data curve can be “reconstructed” using Eq. (1) without any deviation.
To capture the betweencurve variations, this paper presents a waveletbased local randomeffect model to capture the characteristics of nonlinear functional data. To elaborate, the following three subfigures in Figure 2 are generated from our model based on Haar wavelets using different sizes of supports covering local betweencurve variations. In each subfigure, 20 curves of 256 data points in the time domain are generated based on a wavelet randomeffect model from the Haar family in which the variance equals four. In Figure 2(a) one wavelet coefficient (c_{4,7}) at a coarser level in the fourth resolution level is assumed to be a random effect. The support of this coefficient covers from t_{97} to t_{112}. Figure 2(b) shows a wider support area [t_{65}, t_{96}] of a coarser level wavelet coefficient (c_{3,3}) in the third resolution level. Figure 2(c) shows a much wider support area [ 65 t , 128 t ] of a coarser wavelet coefficient (c_{2,2}) in the second resolution level. Section 3 shows that the proposed model allows one to zoom into a certain local area to understand why the process has more variations at that specific location and investigate the potential causes. Thus, as motivated by the above data compression and data reduction literature, our goal is to present a data reduction procedure for extracting representative variance parameters for multiple functional data.
3. WAVELET BASED DATA REDUCTION METHOD
To extract significant varaibles in the wavelet domain for multiple curves, we adopted the wavelet mean variance thresholding (WMVT) procedure proposed by Jeong et al. (2018). Denote θ_{ij} the j th true wavelet coefficient for the i th curve and d_{ij} the sample version of θ_{ij}. The d_{ij} ’s are independent and $N({\theta}_{ij},\text{\hspace{0.33em}}{\sigma}^{2})$ is normally distributed, where θ_{ij} ’s and σ^{2} are unknown parameters to be estimated and assume that ${\theta}_{1j}=\dots ={\theta}_{Mj}\equiv {\theta}_{\cdot j}$ for keeping the model simple. θ_{ij} ’s are modeled as random effects such as ${\theta}_{ij}\sim N({\theta}_{\cdot j},\text{\hspace{0.33em}}{\tau}_{j}^{2})$, where θ_{⋅j} measures the average value of wavelet coefficients in the j th position while ${\tau}_{j}^{2}$ is the waveletpositiondependent variance. To simplify expressions, we assume that ${\theta}_{ij}\sim N({\theta}_{\cdot j},\text{\hspace{0.33em}}{\tau}_{j}^{2})$ with the convention that ${\tau}_{j}^{2}=0$ implies a fixedeffect model of θ_{⋅j}. Thus, the local randomeffect model can be expressed as follows:
where D = [d_{ij}] is a M × N vector of all DWT transformed wavelet coefficients, $\Theta ={[{\theta}_{1}{}^{T},\dots ,{\theta}_{M}{}^{T}]}^{T},\text{\hspace{0.33em}}{\theta}_{i}={[{\theta}_{i1},\text{\hspace{0.33em}}{\theta}_{i2},\text{\hspace{0.33em}}\dots ,\text{\hspace{0.33em}}{\theta}_{iN}]}^{T},\text{\hspace{0.33em}}Z={[{z}_{1}{}^{T},\dots ,{z}_{M}{}^{T}]}^{T}$, and z_{i} is a column of 1 × N random errors from the normal distribution $N(0,\text{\hspace{0.33em}}{\sigma}^{2}+{\tau}_{j}^{2})$.
Assume that all coefficients are random and random effect coefficients are independent (Guo, 2002;Morris and Carroll, 2006). Then, the distribution of d_{ij} follows the normal distribution with the mean of . θ_{j} and variance of ${\sigma}^{2}+{\tau}_{j}^{2}$. Then, the likelihood function is given as follows: $D={\left(2\pi \right)}^{\frac{NM}{2}}{\left({\sigma}^{2}+{\tau}_{j}^{2}\right)}^{\frac{M}{2}}\text{exp}\left(\frac{1}{2}\frac{{\displaystyle \sum}_{i=1}^{M}{\displaystyle \sum}_{j=1}^{N}{\left({d}_{ij}{\theta}_{j}\right)}^{2}}{{\sigma}^{2}+{\tau}_{j}^{2}}\right)$. Maximizing D is equivalent to minimizing the negative log likelihood function:
where K is a constant independent of mean and variance parameters. Thus, estimation of the mean and variance parameters can be achieved by minimizing the above negative loglikelihood, i.e.,
Note that we make ${\tau}_{j}^{2}=05$ even though σ^{2} ≥ 1. To encourage sparsity among θ_{⋅j}’s and τ_{⋅j}’s to keep the number of coefficients small for the purpose of data reduction, we impose two penalties (λ_{1} and λ_{2}) at the end of the loglikelihood function:
By using an iterative procedure, the estimation of each parameter is obtained as follows (See Jeong et al., 2018 for details):
3.1 Initialize an Estimate of σ^{2} :
The initial estimate of σ^{2} can be obtained from the following pooled variance. For each curve, the common variance σ^{2} for M curves can be estimated by averaging these robust estimates (Donoho and Johnston’s (1994): $\widehat{\sigma}={M}^{1}{\displaystyle \sum}_{i=1}^{M}{0.6745}^{1}median(\left{d}_{im}\right:N/2+1\le m\le N)$, where the index m indicates wavelet coefficients at the finest level.
3.2 Initialize an Estimate of τ_{j} ’s:

(i) If the sample variance of d_{⋅j} is larger than the current estimate of σ^{2}, estimate ${\tau}_{j}^{2}$ by the difference between the two. That is, this position of wavelet coefficients has a random effect.

(ii) Otherwise, estimate ${\tau}_{j}^{2}$ by zero.
3.3 Update θ_{⋅j} ’s by Minimizing Eq. (1) with RESPECT to θ_{⋅j} ’s:
By minimizing the penalized loglikelihood function with respect to θ_{⋅j} ’s, we obtain the following closed form solution for the estimate of θ_{j} ’s.
where ${(x)}_{+}=\mathrm{max}(x,\text{\hspace{0.17em}}0){\overline{d}}_{\cdot j}=({d}_{1j}+\dots +{d}_{Mj})/M$ .
3.4 Update ${\tau}_{j}^{2}$ by Minimizing Eq. (1) with Respect to τ_{j} ’s:
Similarly, by minimizing the penalized loglikelihood function with respect to τ_{j} ’s and by defining ${s}_{j}^{2}={\displaystyle \sum}_{i=1}^{M}{({d}_{ij}{\theta}_{\cdot j})}^{2}/M$, we can also obtain a closed form solution for the estimate of ${\tau}_{j}^{2}$ as follows (see Appendix for its derivation):
3.5 Update σ^{2} by Minimizing Eq. (1) with Respect to σ^{2} :
We can solve the following equation to obtain the updated estimate of σ^{2} :
3.6 Repeat Steps (3)(5) Until Convergence
4. COMPARISON STUDIES USING REALLIFE EXAMPLES
Intuitively, singlecurve wavelet thresholding and data reduction procedures based on single curves (Donoho and Johnstorne, 1994) can be extended to multiple curves. However, unless all of the curves are considered together, the wavelet positions selected from individual curves usually differ. This complicates the decision of which positions represent all M curves. One approach is to use the union method for including positions selected from different curves (Mörchen, 2003). However, this method can be expected to use many positions and thus does not function well for data reduction. An alternative is to take only the intersection of positions, i.e., include a position only when it is selected in all curves (Weng and Young, 2017). Although this approach might yield good data reduction, its modeling performance is often poor. Some other procedures such as majority voting (e.g., if three or more coefficients from six curves are selected at one position, the position is included) can be used. However, all of these approaches are ad hoc and lack a solid justification based on models for M curves. More importantly, unlike our wavelet localrandomeffect model, these procedures do not model local variations explicitly.
In attempting to use information from all curves to decide “representative” wavelet positions, Jung et al. (2006) developed a verticalenergythresholding (VET) procedure. The VET considers the following energy measure based on the squared L_{2} norm of coefficients from all curves in one wavelet position j:
Then, extending the data reduction idea studied in Jeong et al. (2006), as the next step the VET used the objective function below to balance the modeling accuracy and data reduction goals and chose λ as a threshold value of energy measures at all positions:.
When the energy of the j th position, $\parallel {d}_{vj}{\parallel}^{2}$ is larger than the threshold value, all estimates of coefficients $({\widehat{\theta}}_{1j},\text{\hspace{0.33em}}\dots ,\text{\hspace{0.33em}}{\widehat{\theta}}_{Mj})=({d}_{1j},\text{\hspace{0.33em}}{d}_{2j},\text{\hspace{0.33em}}\dots ,\text{\hspace{0.33em}}{d}_{Mj})$ at that wavelet position j (for j = 1, 2, …, N) are retained. Then, inverse DWT can be applied to these retained wavelet coefficients to reconstruct the original multiple curves. Chang and Vidakovic (2002) assumed that the true model in the wavelet domain is ${d}_{ij}={\theta}_{.j}+{z}_{ij}$, where z_{ij} are random variates from N(0,σ^{2} ) distribution, and used a Bayesian formulation to develop a “Steintype” shrinkage method called “VertiShrink” to estimate wavelet coefficients . θ_{j} ’s by maximizing a predictive density of the waveletcoefficients across all curves at a particular position. The following blockvertical estimate was proposed:
In this section, two reallife data sets are used to compare them with our WMVT method. The evaluation of their performance uses the following criteria commonly seen in the wavelet thresholding and signal compression literature. Note that there are a total of M × N data points from M curves with N wavelet positions.

(1) K_{1} : number of nonzero mean wavelet coefficients;

(2) K_{2} : number of positions with wavelet randomeffects;

(3) K_{3} : number of wavelet coefficients used to reconstruct multiple curves; for example, when VET selects K_{VET} positions, it uses M × K_{VET} to reconstruct the data curves;

(4) Data reduction ratio (DRR): DRR = (K_{1} + K_{2} + 1)/(MN) for the WMVT procedure or (K_{3} + 1) /(MN) for others;

(5) Relative error: RelError $={\displaystyle \sum _{i=1}^{M}\Vert {y}_{i}{\widehat{f}}_{i}\Vert}/{\displaystyle \sum _{i=1}^{M}\Vert {y}_{i}\Vert ,}$, which is the L_{2} error normalized by the signal energy;

(6) Root mean square error (RMSE): RMSE = $\sqrt{\frac{1}{M}{\displaystyle \sum}_{i=1}^{M}{\left({y}_{i}\widehat{f}\right)}^{2}}$
Experiment 1Antenna Signals: The popularity of wireless communications has increased the need for high quality, technically sophisticated antennae. We collected data sets to develop procedures to monitor antenna manufacturing quality and detect process problems. Equipment used in such testing receives antenna signals at different degrees of elevation and azimuth (Jeong et al., 2007). This study focuses on the zeroazimuth cut data curves generated from 20 antenna data sets under normal conditions. The typical pattern of antenna data under normal conditions is shown in Figure 1. The antenna quality is evaluated according to various regulations regarding the signal patterns.
Figure 3 shows the reconstructed multiple curves based on different procedures. In particular, Figure 3(a) shows the reconstructed curves from the WMVT procedure with λ_{1} =150 and λ_{2} = 600. Figure 3(b) indicates that the VertiShrink (designed for estimating the baseline curve) does not capture the local variations. Figure 3(c)(e) shows that the VET, VisuUnion and VisuIntersection have larger modeling errors around the center locations and failed to capture the local variations along the two sides. Table 1 summarizes comparisons of five different procedures for antenna signals. The WMVT procedure uses a total of 86 coefficients (K_{1} + K_{2} = 86 and one from σ^{2} ) to model the curves. Among the five procedures, three  the VET, VisuUnion, and VisuIntersection  use a much larger number of coefficients. Except the VertiShrink, the VET, VisuUnion, and VisuIntersection procedures use much more number of coefficients. The VET and VisuIntersection have larger relative errors, and the Visu Union has relative error similar to the WMVT. However, because VisuUnion uses too many coefficients, its DRR are larger than the others. The WMVT has the both smaller DDR and RMSE.
Experiment 2  Tonnage Signals: Tonnage signals were used for monitoring and diagnosis of a stamping process (Jin and Shi, 1999). Tonnage signals contain process information relating to the deformation stage. Figure 4(a) shows 24 sets of tonnage signals under normal working conditions, and the data size of each curve is 256. Figure 4(b) shows only the center area and indicates that all tonnage signals have similar characteristics, but the center area has a larger local betweencurve variation. The localvariations are contributions of the randomness of the distribution of lubricants and material uniformity.
Figure 5 shows the reconstructed multiple curves for tonnage signals from different procedures. In particular, Figure 5(a) shows the reconstructed tonnage curves from the WMVT procedure with λ1 = 400 and λ_{2} = 4000 . The WMVT procedure uses 22 nonzero wavelet coefficients for mean modeling and three wavelet randomeffects for variance modeling. All other procedures except Verti Shrink perform well in capturing the local variations around the center, but they use too many wavelet coefficients. Table 2 summarizes numerical comparison results. As in the antenna example, VisuUnion has the smallest relative error, but its DRR are the largest because it uses too many coefficients. VET has a smaller DRR, but these are still much larger than those of the WMVT. Compared with the data reduction and relative errors in antenna curves, WMVT, VertiShrink and VET procedures perform better for tonnage curves. Neither the VisuUnion nor the VisuIntersection procedure works well with tonnage curves even when the curves involved are much smoother and the local variations are smaller.
5. CONCLUSION AND FUTURE RESEARCHES
This paper presented the waveletbased data reduction model to reduce a large volume of functional data into a representative data set by considering the characteristics of the data. Unlike existing data reduction methods for single curves, the presented WMVT proceudre considered all of the curves together, which complicates the decision of which positions represent all M curves. By using the WMVT method, the original functional data can be reconstructed with limited reconstruction errors. It is signficnat to deal with huge amounts of high dimensional data coming from various datacollection instruments in diverse applications. Based on reallife data analyses, we found that the WMVT model adequately describes local variations and uses fewer model parameters to reduce data dimension. Our future research will concentrate on root cause diagnosis with the selected wavelet coefficients where the decision rules derived from the reducedsize data is satisfactory. In addition, the proposed method in this paper can be extended to more sophisticated functional data from diverse applications.