Journal Search Engine
Search Advanced Search Adode Reader(link)
Download PDF Export Citaion korean bibliography PMC previewer
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.21 No.1 pp.137-150
DOI : https://doi.org/10.7232/iems.2022.21.1.137

Sequential Design of Computer Experiments: Current Status and Future Directions

Mai Thi Hong Phan, Jai-Hyun Byun*
Department of Industrial and Systems Engineering, Gyeongsang National University, Republic of Korea
*Corresponding Author, E-mail: jbyun@gnu.ac.kr
September 16, 2021 November 20, 2021 December 7, 2021

ABSTRACT


A brief literature review addresses the sequential design of computer experiments. Algorithms, initial designs, surrogate modeling, and stopping criteria are presented. Sequential designs are classified in terms of deterministic/ stochastic, granularity, and model-free/model-dependent. Criteria for exploitation and exploration are presented. Some hybrid algorithms that balance exploitation and exploration are introduced and compared with an illustrative example. Practitioners working on computer simulators can benefit from the review presented in this paper by implementing the sequential design of computer experiments.



초록


    1. INTRODUCTION

    In today’s competitive world, scientists and engineers want to develop quality products quickly while maintaining low cost. They use computer simulations in design and testing instead of time-consuming and expensive experiments, or even in some circumstances when these experiments are very hard to perform. For example, computer simulations can model complex physical phenomena, from the airflow around an airplane’s wings to the impacts of climate change. They aim to get an insight into inputoutput relationships and to predict the future behavior of those systems. In engineering, computer simulations help in solving partial differential equations (PDEs) in computational modeling, such as finite element method (FEM) or computational fluid dynamics (CFD). However, these sophisticated computer codes can also be very time-consuming to operate. For instance, a crash simulation for a full passenger car takes 36 to 160 hours to compute (Gorissen et al., 2006); a CFD simulation of a cooling system can take several days to complete (Ponomarev et al., 2012). Besides high computing costs, there is also a lack of data for complicated simulations, such as developing crashworthiness for a new vehicle or earthquake propagation. These problems in modeling raise a demand to replace the original simulator with a simpler mathematical model, a surrogate. A single evaluation of this statistical model can respond much faster than an original simulation. Therefore, we can quickly observe the behavior of the simulation model and then make inferences about the real-world system. Figure 1 illustrates the relationship between computer experiment, surrogate model, and physical experiment in inferring and predicting a real-world system.

    Surrogate modeling can be classified as a datadriven or statistical approach. It considers the simulator as a black-box function in which only the input-output behavior is crucial. It obtains the simulation outputs at several selected locations in the design space. Then the surrogate is constructed as a statistical model by assembling these inputs and outputs into a training dataset, as shown in Figure 2. Surrogate modeling can be viewed as a special case of supervised learning. Several popular machine learning techniques have been adopted to build the surrogate, including the Gaussian process (or Kriging), artifi- cial neural networks (ANNs), and radial basis functions (RBFs). Among these techniques, the Gaussian process or Kriging is popularly used for constructing computer experiment models. The accuracy of a surrogate model depends on the number of design points and their locations. A further goal is to reduce the number of samples as much as possible while generating a proficient surrogate model. The groundbreaking work of Sacks et al. (1989) introduced a framework of design and analysis of computer experiments (DACE) with two basic statistical questions:

    • • How many points should be sampled in which locations in our design spaces? (Design problem)

    • • How should the data be used to fulfill the objectives? (Analysis problem)

    Generally, we cannot know in advance the number of points required for building an accurate surrogate. Sampling techniques discussed so far in the literature are often one-shot designs in which all the samples are generated at once. The samples are evenly distributed over the entire design space, leading to a significant waste of resources since the input-output relations are more complicated in some design regions. Therefore, the idea led to the concept of sequential design of computer experiments (SDCE), in which we enrich the data set during the sampling process. The practice of SDCE determines the region of interest in which the surrogate might be uncertain or contains the optimum values of the design parameters. SDCE avoids selecting too many samples (over-sampling) or too few samples (under-sampling), saving computational resources.

    The sequential design of computer experiments has received greater attention in the past two decades. Figure 3 shows the number of articles published on DACE and SDCE from 1995 through 2019. The data was retrieved from Google Scholar on 15 August 2020 with the search keywords “design of computer experiments,” “experimental design,” “computer simulation,” “sequential design of computer experiments,” and “adaptive sampling.” The number of articles in SDCE has increased despite DACE numbers decreasing dramatically since 2012.

    In this article, we present a brief systematic literature review of the sequential design of computer experiments. The paper is organized as follows. Section 2 gives an overview of algorithms, initial designs, surrogate modeling, and stopping criteria for sequential designs. Section 3 classifies sequential designs in terms of deterministic/ stochastic, granularity, and model-free/model-dependent. The importance of exploitation and exploration is explained in section 4, with a numerical example to demonstrate some sequential strategies in balancing exploitation and exploration. The last section presents a summary and discussions on future directions.

    2. SEQUENTIAL DESIGN ALGORITHM

    Traditional DACE is based on one-shot design approach. In DACE framework, a system is first analyzed to construct a simulator. An initial design is chosen to spread samples evenly across the experimental space. Responses are observed from the simulator when the sample design points are fed to the simulator. Next, a surrogate model is built using this data. The one-shot design approach is shown in the blue-dash box in Figure 4. All data points are chosen at once in a one-shot design, and the modeling algorithm proceeds without evaluating any additional samples. Finally, one can predict the response at untried points, optimize a function of the response or tune the computer code to physical data (Sacks et al., 1989). However, the initial design will inevitably miss certain design space features. Sequential design can improve this one-shot approach by transforming the algorithm into an iterative process. The sequential design approach (orangedash box in Figure 4) uses some infill criteria to identify additional samples with the highest information value. Infill criteria play critical roles in determining the locations to add more point(s) to the existing data set. The optimal criteria may depend on several factors, including the types of surrogate models under consideration, the number of samples required, global fit vs. optimization. This sequential sampling procedure will terminate when some stopping criteria (model accuracy or computational budget) are satisfied. Finally, the approved surrogate is used for prediction, optimization, or calibration purposes. The sequential design approach is more flexible than the one-shot approach.

    2.1 Initial Design

    The objective of making an initial design is to obtain a good representation of the response surface. We try to evenly distribute the sample points over the design region to obtain maximum information about the response. In physical experiments, factorial designs or fractional factorial designs are commonly used to explore the response surface in the initial stage of experimentation. However, for computer experiments, space-filling designs are preferred since the entire design space is of interest, and often a highly nonlinear response surface is to be estimated. One chooses a few levels for each factor and uses methods such as factorial designs to create a given number of runs.

    McKay et al. (1979) set a key milestone by proposing the Latin hypercube design (LHD), also known as Latin hypercube sampling (LHS). LHD guarantees uniform coverage in every single-dimensional projection while maintaining the properties of random uniform sampling. Consider a d-dimensional design space. Divide the design region evenly into cubes by partitioning each dimension into n equal segments of length 1 n . Then, arrange n sample points, ensuring that the samples contain just one point in each segment. An illustration of LHD (d = 2, n = 10) is shown in Figure 5. Each dimension has 10 bins. The samples are placed so that no row or column has more than one point. However, there are many possible choices for LHDs, and not all of them are good.

    Another popular space-filling scheme is maximin design based on geometric criteria. First, we define a notion of distance to measure the spread between two points by a simple choice of Euclidean distance:

    d ( x , x ' ) = | | x x ' | | 2 = j = 1 n ( x j x j ' ) 2
    (1)

    A maximin design Xn maximizes the minimum distance between any pairs of points.

    X n = a r g m a x X n   m i n { d ( x i , x k ) : i k = 1 , ... , n }
    (2)

    Figure 6 illustrates a maximin design in two dimensions.

    Minimax distance designs (Johnson et al., 1990) achieve a maximum spread of samples by minimizing the worst distance between any arbitrary pair of points in the design space. One disadvantage of minimax and maximin designs is that they do not have good projections onto subspaces. Morris and Mitchell (1995) proposed maximin Latin hypercube designs (MnLHD), combining the pros of the two design methods. An MnLHD is an LHD that maximizes the minimum distance among the samples. It achieves space-filling in both the full and single dimensions of the input space. With the same spirit, maximum projection (MaxPro) designs (Joseph et al., 2015b, 2016) were recently introduced to maximin distance in all subspaces of the experiment region.

    The sample size has a substantial impact on the initial design. It must be large enough to guarantee minimal coverage of the design space and be as small as possible to save computational expenses. Chapman et al. (1994) and Jones et al. (1998) suggested using a sample size 10 times bigger than the input dimension, which means 10d when d is the number of input variables. Loeppky et al. (2009) investigated this 10d-thumb-rule and concluded that this 10d sample size is suitable for an initial design when d ≤5. Initial surface estimates were adequate for the performance of their sequential design algorithm if an initial design consumed roughly 25–35% of a fixed budget of runs (Ranjan et al., 2008).

    2.2 Surrogate Modeling

    A variety of modeling techniques are available for constructing surrogates. The choice of technique depends on one’s expectation of how complex the underlying response is. Response surface methodology (Box et al., 2005) and artificial neural network (Haykin, 1994) are appropriate for building fast and straightforward approximations. Other models in the literature are multivariate adaptive regression splines (Friedman, 1991) and radial basis function approximations (Dyn et al., 1986). Among these techniques, Gaussian Kriging models are most popularly used for computer experiments.

    The Kriging model was derived from the geostatistics field by a South African geologist in his master’s thesis (Krige, 1951). It was proposed to model computer experiments by Sacks et al. (1989). The data set consists of n vector of input variables X =   ( x 1 , ... ,   x n ) , and the corresponding output y = ( y 1 , ... ,   y n ) . The response of the computer code is modeled as a function of input vector x:

    y ( x ) = f ( x ) T β + Z ( x )
    (3)

    where f(x) is a set of pre-specified functions (chosen basic) and β is a set of unknown coefficients. Z(x) is a stationary Gaussian process with mean zero, and nonzero variance σ2, with the covariance of

    Cov ij = c o v [ Z ( x i ) , Z ( x j ) ] = σ 2 R i j ( x i , x j )
    (4)

    where Cov is the covariance operator, σ is the standard deviation of the stochastic process. R i j ( x i , x j ) is the correlation between outputs corresponding to the two samples xi and xj , defined as the component of the autocorrelation matrix R.

    We define the distanced function for k covariates:

    R i j ( x i , x j ) = exp { l = 1 k θ l | x i l x j l | p l }
    (5)

    where θl ≥0 are unknown correlation parameters. The correlation parameters θ = (θ1,..., θk) are scale parameters and p = (p1,..., pk) are power parameters.

    The best linear unbiased predictor (BLUP) (Santner et al., 2018) is used to predict the response at an untried x0 :

    y ^ ( x 0 ) = f 0 T β ^ + r T R 1 ( y     F β ^ )
    (6)

    where r = [ R ( x 0 , x 1 ) , , R ( x 0 , x n ) ] T , f 0 = f ( x 0 ) , β ^ = ( F T R 1 F ) 1 F T R 1 y , R is the n×n matrix with entries R i j ( x i , x j ) and F = [ f ( x 1 ) , ... , f ( x n ) ] is the regressor matrix. The unknown correlation parameters θ can be estimated by maximizing:

    1 2 ( n · l n ( σ ^ 2 ) + l n | R | )
    (7)

    where σ ^ 2 = ( y F β ^ ) T R 1 ( y F β ^ ) / n

    2.3 Stopping Criteria

    Commonly used stopping criteria are as follows:

    • - Time constraints, which might depend on the project deadline or time budget.

    • - Computational facility resource constraint, for example, the maximum number of runs preset by the user.

    • - Accuracy goal, which is evaluated by comparing the actual response and the model’s predicted values. The larger the error is, the more inaccurate the surrogate is. Table 1 (Fuhg et al., 2021) introduces some error estimation methods proposed to evaluate the model’s accuracy.

    3. CLASSIFICATION OF SEQUENTIAL DESIGN OF COMPUTER EXPERIMENTS

    3.1 Deterministic or Stochastic

    Deterministic simulations are widely used in various engineering applications. For the same set of inputs, the simulator reproduces the same outputs. Therefore, traditional blocking, randomization, and replication methods used in physical experiments are irrelevant when conducting computer experiments.

    This is not the case for stochastic simulation when we intentionally introduce variance to describe the natural stochasticity of the process. Replicates may be collected at every sample to estimate the noise. Stochastic simulation can be encountered in various applications, such as optimizing call center staffing (Aksin et al., 2009) and biology (Johnson, 2008). These stochastic models are often run in two stages. In the first stage, a small number of design points are taken as samples. For the second stage, the simulations are replicated at points that have large sample variances. Kim et al. (2017) extended the minimum energy design into the batch sequential situation for a stochastic response. Schneider et al. (2017) adopted the expected improvement criteria based on the Gaussian process model for finding the maximum likelihood estimate of a stochastic differential equation.

    3.2 Granularity

    The sequential design of computer experiments can be classified regarding granularity, the number of sample points generated at each iteration. There are two popular sampling methods, sampling at single or multiple points, as shown in Figure 7.

    The single-point strategy adds only one point per iteration, while the sequential batch strategy simultaneously generates two or more points for each iteration. For instance, Loeppky et al. (2010) proposed a sequential batch design that adds design points using orthogonal arraybased Latin hypercubes. Duan et al. (2017) proposed a similar method where design points can be added indefinitely.

    3.3 Model-free or Model-Dependent

    Sequential design can be used in two ways; one for model-free design and the other for model-dependent design. Model-free sequential design is directed to spacefilling in the whole experimental region without resorting to fit the model. In model-free design, no assumption is made about which type of model is used or how the model will behave. The idea is to evenly distribute the sample points in the input design space without using the responses from previous simulations. On the other hand, in model-dependent sequential design, a model to be fitted is assumed in advance to provide criteria for answering specific questions. Model-dependent sequential design can be divided into sub-categories based on the design objectives: global approximation and optimization. Sequential design for global fit emphasizes sequentially improving the accuracy of the surrogate over the entire domain so that the surrogate model can be a good replacement for the original simulator. A common goal of global fitting is to minimize the average squared error over the surface. Sequential design for optimization focuses on finding an optimum region to fulfill the optimization objective. The classifications are shown in Figure 8.

    The purpose of the model-free sequential design is to spread out points over the design space to obtain a maximum understanding of response behavior once they are observed. Some criteria used to construct one-shot designs can also be implemented in a sequential version. For example, a sequential version of maximum distance sequential design (MmDist) can start with a MmDist with a small number of samples, and each subsequent point is placed to maximize the minimum interpoint Euclidean distance. The advantage of the model-free sequential design is that it does not need to specify the initial sample size, and subsequent samples are added sequentially until the stopping rules are satisfied. Loeppky et al. (2010) developed a sequential batch design with a distance-based strategy. Duan et al. (2017) developed a new batch sequential design called sliced full factorial-based LHD. The design has good orthogonality and one-dimension projection in certain stages. Qian (2009) proposed nested LHDs and showed that they could outperform the independent and identical distribution (i.d.d.) sampling. The nested LHD with n runs has multiple layers that are also LHDs. These LHDs can generate a series of smaller LHDs with fewer runs. Nested LHDs are suitable for computer experiments involving higher and lower levels of accuracy codes. Other modifications of nested LHD are nested orthogonal LHD (Yang et al., 2014), sequential refined LHD (Xu et al., 2015), and flexible nested LHD (Chen et al., 2017). Kong and Tsui (2018) extended the nested Latin hypercube design to the situation without the sample size limitation.

    Model-free designs can be further classified according to different criteria. First, they can be categorized by which type of design is used as the initial design. Various selection methods are considered, such as distance-based, discrepancy-based, and matrix-based construction methods. How many samples will be selected at a time (granularity): one point at a time or a batch? The designs can also be compared regarding the input types involved, such as qualitative, quantitative, or qualitative and quantitative. Lastly, sequential designs can be evaluated and compared with respect to space-filling, projective, and orthogonal properties. Some recent papers for the model-free designs of computer experiments are introduced in Table 2. In short, the model-free approach is very flexible for nonlinear, nonparametric generic regression. It has plenty of variations based on different geometric criteria and highly customized solving algorithms.

    Model-dependent sequential designs have been extensively studied for computer experiments (Christen et al., 2011;Gramacy, 2020). Lam (2008) adapted a sequential design strategy for fitting response surface models of computer experiments. Gramacy and Lee (2009) introduced flexible sequential designs of supercomputer experiments based on Gaussian process models. Joseph et al. (2015a) proposed a new deterministic space-filling sampling method for the exploration of complex response surfaces, namely minimum energy design (MinED). The creative idea was derived from the physical analogy of visualizing design points as charged particles in a box and minimizing the total potential energy inside the box. This method employs posterior evaluation to adapt to different types of response surfaces by choosing the charge function inversely proportional to the function of interest. Kim et al. (2017) extended this work into a sequential batch situation when the number of batches to be run is known in advance. Some recent model-dependent designs are listed in Table 3.

    4. EXPLORATION AND EXPLOITATION

    Exploration searches the whole design space, and it helps to find key regions that have not been identified before, such as discontinuities, steep slopes, optima, or stable regions. Exploitation selects samples in regions that have been identified as interesting or potentially in need of investigation. Exploitation uses the outputs from previous iterations to guide the sequential sampling process. In this section, several exploitation and exploration criteria are introduced. A significant emphasis on a tradeoff between exploration and exploitation is stated with an illustrative numerical example.

    4.1 Classification of Exploration Criteria

    Exploration aims to even look over the whole input domain to gain a “general” knowledge of the mapping. Thus, the pure exploration strategy performs adaptive sampling while ignoring previously evaluated outcomes. Exploration aims to generate sample points to fill the domain.

    4.1.1 Distance-based Criteria

    Maximin and minimax distance

    Johnson et al. (1990) proposed two distance-based criteria, namely maximin (Mm) and minimax (mM), to spread sample points within the design space, as described in Figure 9. The maximin criterion maximizes the minimum distance between two sample points.

    M n ( X n ) =   max x D [ min j k [ d ( x j , x k ) ] ]  
    (8)

    On the other hand, the minimax criterion minimizes the maximin distance between two points:

    n M ( X n ) =   min x D [ max j k [ d ( x j , x k ) ] ]  
    (9)

    The distance here takes the choice of Euclidean distance of two points x, x′:

    d ( x , x ' ) = | | x x ' | | 2 = j = 1 n ( x j x j ' ) 2
    (10)

    The framework can also be extended to other choices of distance, such as Mahalanobis distance.

    Mahalanobis distance

    The Mahalanobis distance of an observation x = ( x 1 , ... , x N ) from a set of observations with mean μ = ( μ 1 , ... , μ N ) and covariance matrix S is defined as

    Δ = ( x μ ) T S 1 ( x μ )
    (11)

    Voronoi tessellation

    Consider a set of distinct points X N = { x 1 , x 2 , ... , x n } in an open bounded domain Ω d . Every sample x i , i = 1 , 2 , ... , n has a corresponding Voronoi cell Vi given as

    V i = { x i X N | d ( x , x i ) d ( x , x j ) } j i
    (12)

    where d denotes the Euclidean distance in d .

    Crombecq et al. (2011) proposed LOLA-Voronoi to combine Voronoi tessellations and local linear approximation (LOLA). While Voronoi tessellations target domain exploration, LOLA guides local exploitation. This algorithm outperformed static techniques for all surrogate model types. Singh et al. (2013) proposed three tradeoff schemes to balance exploration and exploitation for the LOLA-Voronoi strategy. Van der Herten et al. (2015) introduced a fuzzy variation of LOLA (FLOLA). This algorithm provides the benefits of the original one and significantly reduces the computational burden.

    Delaunay triangulation

    Delaunay triangulation is the dual of a Voronoi diagram. Assume that P is a given set of discrete points in a plane and E is a set of closed line segments composed of points in P. A triangulation T=(P, E) of the point set P is a plane graph that satisfies the following conditions: 1) all faces made by the edges are triangular faces, and 2) the edges in the plane do not contain any point in set P. A Delaunay triangulation for a given set P is a triangulation DT(P) such that there will be no other points within the circumcircle of any triangle. Delaunay triangulations help to maximize the minimum angle of all the angles of the triangles in the triangulation. Voronoi tessellation and Delaunay triangulation are illustrated in Figure 10 (Fuhg et al., 2021).

    4.1.2 Error-based Criteria

    The key advantage of the Gaussian process-based model is that it permits the calculation of an estimated error in the model. The search algorithm uses the estimated error to position infill points where the model prediction is most uncertain. It created a premise to the integrated mean-squared error (Sacks et al. 1989). Other variations of this exploration technique can be found in Jones et al. (1998), Lam (2008), and Liu et al. (2017).

    4.2 Classification of Exploitation Criteria

    4.2.1 Cross-validation Exploitation

    The merit of the cross-validation (CV) approach is that there is no need to add new samples to assess metamodel accuracy. Leave-one-out cross-validation (LOOCV) is a particular case of k-fold cross-validation. We leave out a sample point from an initial design set Dn and build a surrogate with the rest of the points. The surrogate is used to predict the response at the leave-outsample point. Then we calculate the cross-validation measure between the predicted value y ^ ( x i ) by the surrogate and the response value y ^ i ( x i ) at the leave-outsample:

    e L O O x i = | y ^ ( x i ) y ^ i ( x i ) |
    (13)

    where y ^ ( x i ) is the prediction of the metamodel based on all the sample points in Dn , and y ^ i ( x i ) is the prediction of the metamodel based on n - 1 sample points.

    4.2.2 Gradient-based Exploitation

    Crombecq et al. (2011) introduced a parametric input space with Voronoi tessellation, then approximated the gradient in each cell from other neighborhood information. Local nonlinearities are evaluated from an approximation of the Lipschitz constant using neighbor points in Lovison and Rigoni (2010).

    4.3 Balance between Exploration and Exploitation

    In sequential design, a tradeoff must be made between exploration and exploitation. Without proper design space exploration, one might miss some interesting regions of the design space. Some hybrid algorithms that balance exploitation and exploration are introduced in the literature, such as the sequential minimum energy design (Joseph et al., 2015a), which uses a charge function. This criterion depends on the objective of the experiment since it can hold both the global fitting problem and optimization problem by selecting a suitable energy function. Crombecq et al. (2011) presents an adaptive design algorithm for exploration that puts more samples in areas with higher nonlinearity. Deciding on which point to select is based on 1) an exploration criterion based on Voronoi tessellation of the input space and 2) an exploitation criterion based on a nonlinearity measure calculated using the local linear approximation algorithm. A summary of sequential design approaches for computer experiments and their algorithms is listed in Tables 4 and 5.

    4.4. An Illustrative Numerical Example

    This numerical example presents a comparison between several sequential design algorithms in balancing exploration and exploitation. The test function is adapted from Farhang-Mehr and Aarm (2005):

    IEMS-21-1-137_EQ14.gif
    (14)

    The initial dataset comprises seven sample points D = { 0 , 0.25 , 0.375 , 0.5 , 0.625 , 0.75 , 1 } , with one hole intentionally left in the critical region to see if the sequential methods can mimic the shape of this function. The surrogate model is constructed using the Gaussian process. Twelve infill criteria are used to select new sample points. The sequential algorithm is stopped when a design budget of 30 points is consumed. The objective functions and their corresponding surrogate outputs are presented in Figure 11. The blue dotted lines imply the target function, while the red lines illustrate surrogate modeling outputs. The black dots show the locations of the total 30 sample points. The red dots indicate the last point in the sequential sampling process. It can be seen that some techniques such as WAE (Figure 11e) and EGO (Figure 11k) did not detect the critical region, which shows that finding the optimum is quite a challenging task. In these cases, the points are sampled near the lower bound. Most of the other techniques, including SFCVT (Figure 11c), CVV (Figure 11d), AME (Figure 11f), MEPE (Figure 11g), TEAD (Figure 11j), and EIGF (Figure 11l), can find the critical area but are not sufficient to exploit information in localized nonlinearity regions. LIP (Figure 11i) can find the critical region; however, for the budget of 30 runs, the sequential sampling process is stuck in critical regions and fails to obtain an overall response. ACE (Figure 11a), SSA (Figure 11b), and LOLA-Voronoi (Figure 11h) can capture the behavior of the objective function exceptionally well by balancing exploration and exploitation.

    5. SUMMARY AND FUTURE DIRECTIONS

    A brief literature review of the sequential design of computer experiments is performed. Algorithms, initial designs, surrogate modeling, and stopping criteria are presented. Sequential designs are classified in terms of deterministic/stochastic, granularity, and model-free/modeldependent. Criteria for exploitation and exploration are presented. Some hybrid algorithms that balance exploitation and exploration are introduced and compared with an illustrative example.

    Engineers working on computer experiments can benefit by implementing the sequential design. There are a couple of issues to consider for the practical application of the sequential design of computer experiments. First, in the initial design stage, how many points should be tried and in which locations? Since the number of initial design points will increase exponentially with the number of input variables, screening designs with fewer runs to identify active variables are necessary when many input variables are considered. Latin Hypercube, maximin, and minimax distance designs do not perform well with respect to their projections onto subspaces. There is a need for sequential design methods to ensure maximum projectivity and space-filling for the vital few variables after screening design data analysis. Second, model accuracy needs to be enhanced by sequentially integrating computer simulations with real-world field data. The computer models should be calibrated using observations from real systems, usually by physical experiments.

    ACKNOWLEDGEMENT

    This work was supported by a grant from the National Research Foundation of Korea (2018R1D1A1B07 049764), which is gratefully acknowledged.

    Figure

    IEMS-21-1-137_F1.gif

    Inference of a real-world phenomenon by physical experiments, computer experiments, and surrogates.

    IEMS-21-1-137_F2.gif

    Surrogate model as an approximation of the simulator.

    IEMS-21-1-137_F3.gif

    Number of publications on DACE and SDCE over the years 1995 to 2019.

    IEMS-21-1-137_F4.gif

    DACE with one-shot and sequential design flow-chart.

    IEMS-21-1-137_F5.gif

    Latin hypercube design in two dimensions.

    IEMS-21-1-137_F6.gif

    Maximin design in two dimensions

    IEMS-21-1-137_F7.gif

    Classification of SDCE based on granularity

    IEMS-21-1-137_F8.gif

    Classification of SDCE based on model orientation.

    IEMS-21-1-137_F9.gif

    Minimax and maximin Euclidean distance design for n = 7.

    IEMS-21-1-137_F10.gif

    Voronoi tessellation (black solid lines) and Delaunay triangulation (red dashed lines) of 10 sample points (blue dots) on a two-dimensional parametric domain.

    IEMS-21-1-137_F11.gif

    Surrogate model performance after reaching 20 samples for several sequential design strategies. (M is the target function, M^ is the surrogate output)

    Table

    Various error estimations to evaluate metamodels

    Recent model-free designs of computer experiments

    Recent model-dependent designs of computer experiments

    Summary of sequential design approaches for computer experiments

    Algorithms for exploration and exploitation

    REFERENCES

    1. Aksin, Z. , Armony, M. , and Mehrotra, V. (2009), The modern call center: A multi‐disciplinary perspective on operations management research, Production and Operations Management, 16(6), 665-688.
    2. Ajdari, A. and Mahlooji, H. (2014), An adaptive exploration-exploitation algorithm for constructing metamodels in random simulation using a novel sequential experimental design, Communications in Statistics - Simulation and Computation,43(5), 947-968.
    3. Aute, V. , Saleh, K. , Abdelaziz, O. , Azarm, S. , and Radermacher, R. (2013), Cross-validation based single response adaptive design of experiments for Kriging metamodeling of deterministic computer simulations, Structural and Multidisciplinary Optimization, 48(3), 581-605.
    4. Binois, M. , Huang, J. , Gramacy, R. , and Ludkovski, M. (2019), Replication or exploration? Sequential design for stochastic simulation experiments, Technometrics, 61(1), 7-23.
    5. Box, G. E. P. , Hunter, W. H. , and Hunter, J. S. (2005), Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed),John Wiley and Sons, New York.
    6. Busby, D. , Farmer, C. L. , and Iske, A. (2007), Hierarchical nonlinear approximation for experimental design and statistical data fitting, SIAM Journal on Scientific Computing, 29(1), 49-69.
    7. Chen, R. B. , Wang, W. , and Wu, C. F. J. (2017), Sequential designs based on Bayesian uncertainty quantification in sparse representation surrogate modeling, Technometrics, 59(2), 139-152.
    8. Chen, X. and Zhou, Q. (2017), Sequential design strategies for mean response surface metamodeling via stochastic kriging with adaptive exploration and exploitation, European Journal of Operational Research, 262(2), 575-585.
    9. Christen, J. A. and Sansó, B. (2011), Advances in the sequential design of computer experiments based on active learning, Communications in Statistics - Theory and Methods, 40(24), 4467-4483.
    10. Crombecq, K. , Gorissen, D. , Deschrijver, D. , and Dhaene, T. (2011), A novel hybrid sequential design strategy for global surrogate modeling of computer experiments, SIAM Journal on Scientific Computing, 33(4), 1948-1974.
    11. Dyn, N. , Levin, D. , and Rippa, S. (1986), Numerical procedures for surface fitting of scattered data by radial basis functions, SIAM Journal of Scientific and Statistical Computing, 7(2), 639-659.
    12. Duan, W. , Ankenman, B. E. , Sanchez, S. M. , and Sanchez, P. J. (2017), Sliced full factorial-based Latin hypercube designs as a framework for a batch sequential design algorithm, Technometrics, 59(1), 11-22.
    13. Eason, J. and Cremaschi, S. (2014), Adaptive sequential sampling for surrogate model generation with artificial neural networks, Computer & Chemical Engineering, 68, 220-232.
    14. Ezzat, A. A. , Pourhabib, A. , and Ding, Y. (2018), Sequential design for functional calibration of computer models, Technometrics, 60(3), 286-296.
    15. Farhang‐Mehr, A. and Azarm, S. (2005), Bayesian meta‐modelling of engineering design simulations: A sequential approach with adaptation to irregularities in the response behavior, International Journal for Numerical Methods in Engineering, 62(15), 2104-2126.
    16. Friedman, J. H. (1991), Multivariate adaptive regression splines, The Annals of Statistics, 19(1), 1-67.
    17. Fuhg, J. N. , Fau, A. , and Nackenhorst, U. (2021), State-of-the-art and comparative review of adaptive sampling methods for kriging, Archives of Computational Methods in Engineering, 28(4), 2689-2747.
    18. Garud, S. S. , Karimi, I. A. , and Kraft, M. (2017), Smart sampling algorithm for surrogate model development, Computers & Chemical Engineering, 96, 103-114.
    19. Gorissen, D. , Crombecq, K. , Hendrickx, W. , and Dhaene, T. (2006), Adaptive distributed metamodeling. In M. Daydé J. M. L. M. Palma, Á. L. G. A. Coutinho, E. Pacitti, and J. C. Lopes (eds), High Performance Computing for Computational Science - VECPAR 2006, Springer, 579-588.
    20. Gramacy, R. (2020), Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences, Chapman Hall/CRC, Boca Raton, Florida.
    21. Gramacy, R. B. and Lee, H. K. H. (2009), Adaptive design and analysis of supercomputer experiments, Technometrics, 51(2), 130-145.
    22. Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, Macmillan Publishing, New York.
    23. Jiang, P. , Shu, L. , Zhou, Q. , Zhou, H. , Shao, X. , and Xu, J. (2015), A novel sequential exploration-exploitation sampling strategy for global metamodeling, IFAC-PapersOnLine, 48(28), 532-537.
    24. Jin, R. , Chen, W. , and Sudjianto, A. (2002), On sequential sampling for global metamodeling in engineering design, Proceedings of ASME 2002 Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Montreal, Canada, 539-548.
    25. Johnson, M. E. , Moore, L. M. , and Ylvisaker, D. (1990), Minimax and maximin distance designs, Journal of Statistical Planning and Inference, 26(2), 131-148.
    26. Johnson, L. R. (2008), Microcolony and biofilm formation as a survival strategy for bacteria, Journal of Theoretical Biology, 251(1), 24-34.
    27. Jones, D. J. , Schonlau, M. , and Welch, W. J. (1998), Efficient global optimization of expensive black-box functions, Journal of Global Optimization, 13(4), 455-492.
    28. Joseph, V. R. (2016), Space-filling designs for computer experiments: A review, Quality Engineering, 28(1), 28-35.
    29. Joseph, V. R. , Dasgupta, T. , Tuo, R. , and Wu, C. F. J. (2015a), Sequential exploration of complex surfaces using minimum energy designs, Technometrics, 57(1), 64-74.
    30. Joseph, V. R. , Gul, E. , and Ba, S. (2015b), Maximum projection designs for computer experiments, Biometrika, 102(2), 371-380.
    31. Joseph, V. R. , Gu, L. , Ba, S. , and Myers, W. R. (2019), Space-filling designs for robustness experiments, Technometrics, 61(1), 24-37.
    32. Kim, H. , Vastola, J. T. , Kim, S. , Lu, J. C. , and Grover, M. A. (2017), Batch sequential minimum energy design with design-region adaptation, Journal of Quality Technology, 49(1), 11-26.
    33. Kong, X. , Ai, M. , and Tsui, K. L. (2018), Design for sequential follow-up experiments in computer emulations, Technometrics, 60(1), 61-69.
    34. Krige, D. G. (1951), A statistical approach to some basic mine valuation problems on the Witwatersrand, Journal of the Southern African Institute of Mining and Metallurgy, 52(6), 119-139.
    35. Lam, C. Q. (2008), Sequential adaptive designs in computer experiments for response surface model fit, Ph.D thesis, The Ohio State University.
    36. Li, G. , Aute, V. , and Azarm, S. (2010), An accumulative error based adaptive design of experiments for offline metamodeling, Structural and Multidisciplinary Optimization, 40(1-6), 137-155.
    37. Liu, H. , Cai, J. , and Ong, Y. S. (2017), An adaptive sampling approach for kriging metamodeling by maximizing expected prediction error, Computers & Chemical Engineering, 106, 171-182.
    38. Liu, H. , Xu, S. , Ma, Y. , Chen, X. , and Wang, X. (2016), An adaptive Bayesian sequential sampling approach for global metamodeling, Journal of Mechanical Design, 138(1), 011404.
    39. Loeppky, J. L. , Moore, L. M. , and Williams, B. J. (2010), Batch sequential designs for computer experiments, Journal of Statistical Planning and Inference, 140(6), 1452-1464.
    40. Loeppky, J. L. , Sacks, J. , and Welch, W. J. (2009), Choosing the sample size of a computer experiment: A practical guide, Technometrics, 51(4), 366-376.
    41. Lovison, A. and Rigoni, E. (2010), Adaptive sampling with a Lipschitz criterion for accurate metamodeling, Communications in Applied and Industrial Mathematics, 1(2), 110-126.
    42. McKay, M. D. , Beckman, R. J. , and Conover, W. J. (1979), A comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics, 21(2), 239-245.
    43. Mo, S. , Lu, D. , Shi, X. , Zhang, G. , Ye, M. , Wu, J. , and Wu, J. (2017), A Taylor expansion‐based adaptive design strategy for global surrogate modeling with applications in groundwater modeling, Water Resources Research, 53(12), 10802-10823.
    44. Morris, M. D. and Mitchell, T. J. (1995), Exploratory designs for computational experiments, Journal of Statistical Planning and Inference, 43(3), 381-402.
    45. Ponomarev, P. , Polikarpova, M. , and Pyrhönen, J. (2012), Thermal modeling of directly-oil-cooled permanent magnet synchronous machine. In 2012 XXth International Conference on Electrical Machines, IEEE, 1882-1887.
    46. Qian, P. Z. G. (2009), Nested Latin hypercube designs, Biometrika, 96(4), 957-970.
    47. Ranjan, P. , Bingham, D. , and Michailidis, G. (2008), Sequential experiment design for contour estimation from complex computer codes, Technometrics, 50(4), 527-541.
    48. Sacks, J. , Welch, J. W. , Mitchell, T. J. , and Wynn, H. P. (1989), Design and analysis of computer experiments, Statistical Science, 4(4), 409-423.
    49. Santner, T. J. , Williams, B. J. , and Notz, W. I. (2018), The Design and Analysis of Computer Experiments (2nd ed), Springer, New York, NY.
    50. Schneider, G. , Craigmile, P. F. , and Herbei, R. (2017), Maximum likelihood estimation for stochastic differential equations using sequential Gaussian-process-based optimization, Technometrics, 59(2), 178-188.
    51. Shang, B. and Apley, D. W. (2021), Fully-sequential space-filling design algorithms for computer experiments, Journal of Quality Technology, 53(2), 173-196.
    52. Singh, P. , Deschrijver, D. , and Dhaene, T. (2013), A balanced sequential design strategy for global surrogate modeling, Proceedings of the 2013 Winter Simulation Conference, IEEE, 2172-2179.
    53. van der Herten, J. , Couckuyt, I. , Deschrijver, D. , and Dhaene, T. (2015), A fuzzy hybrid sequential design strategy for global surrogate modeling of high-dimensional computer experiments, SIAM Journal on Scientific Computing, 37(2), A1020-A1039.
    54. Xu, S. , Liu, H. , Wang, X. , and Jiang, X. (2014), A robust error-pursuing sequential sampling approach for global metamodeling based on Voronoi diagram and cross validation, Journal of Mechanical Design, 136(7), 071009.
    55. Xu, J. , Chen, J. , and Qian, P. Z. (2015), Sequentially refined Latin hypercube designs: Reusing every point, Journal of the American Statistical Association, 110(512), 1696-1706.
    56. Yang, J. , Liu, M. Q. , and Lin, D. K. (2014), Construction of nested orthogonal Latin hypercube designs, Statistica sinica, 24(1), 211-219.
    Do not open for a day Close