1. INTRODUCTION
In today’s competitive world, scientists and engineers want to develop quality products quickly while maintaining low cost. They use computer simulations in design and testing instead of timeconsuming and expensive experiments, or even in some circumstances when these experiments are very hard to perform. For example, computer simulations can model complex physical phenomena, from the airflow around an airplane’s wings to the impacts of climate change. They aim to get an insight into inputoutput relationships and to predict the future behavior of those systems. In engineering, computer simulations help in solving partial differential equations (PDEs) in computational modeling, such as finite element method (FEM) or computational fluid dynamics (CFD). However, these sophisticated computer codes can also be very timeconsuming to operate. For instance, a crash simulation for a full passenger car takes 36 to 160 hours to compute (Gorissen et al., 2006); a CFD simulation of a cooling system can take several days to complete (Ponomarev et al., 2012). Besides high computing costs, there is also a lack of data for complicated simulations, such as developing crashworthiness for a new vehicle or earthquake propagation. These problems in modeling raise a demand to replace the original simulator with a simpler mathematical model, a surrogate. A single evaluation of this statistical model can respond much faster than an original simulation. Therefore, we can quickly observe the behavior of the simulation model and then make inferences about the realworld system. Figure 1 illustrates the relationship between computer experiment, surrogate model, and physical experiment in inferring and predicting a realworld system.
Surrogate modeling can be classified as a datadriven or statistical approach. It considers the simulator as a blackbox function in which only the inputoutput behavior is crucial. It obtains the simulation outputs at several selected locations in the design space. Then the surrogate is constructed as a statistical model by assembling these inputs and outputs into a training dataset, as shown in Figure 2. Surrogate modeling can be viewed as a special case of supervised learning. Several popular machine learning techniques have been adopted to build the surrogate, including the Gaussian process (or Kriging), artifi cial neural networks (ANNs), and radial basis functions (RBFs). Among these techniques, the Gaussian process or Kriging is popularly used for constructing computer experiment models. The accuracy of a surrogate model depends on the number of design points and their locations. A further goal is to reduce the number of samples as much as possible while generating a proficient surrogate model. The groundbreaking work of Sacks et al. (1989) introduced a framework of design and analysis of computer experiments (DACE) with two basic statistical questions:

• How many points should be sampled in which locations in our design spaces? (Design problem)

• How should the data be used to fulfill the objectives? (Analysis problem)
Generally, we cannot know in advance the number of points required for building an accurate surrogate. Sampling techniques discussed so far in the literature are often oneshot designs in which all the samples are generated at once. The samples are evenly distributed over the entire design space, leading to a significant waste of resources since the inputoutput relations are more complicated in some design regions. Therefore, the idea led to the concept of sequential design of computer experiments (SDCE), in which we enrich the data set during the sampling process. The practice of SDCE determines the region of interest in which the surrogate might be uncertain or contains the optimum values of the design parameters. SDCE avoids selecting too many samples (oversampling) or too few samples (undersampling), saving computational resources.
The sequential design of computer experiments has received greater attention in the past two decades. Figure 3 shows the number of articles published on DACE and SDCE from 1995 through 2019. The data was retrieved from Google Scholar on 15 August 2020 with the search keywords “design of computer experiments,” “experimental design,” “computer simulation,” “sequential design of computer experiments,” and “adaptive sampling.” The number of articles in SDCE has increased despite DACE numbers decreasing dramatically since 2012.
In this article, we present a brief systematic literature review of the sequential design of computer experiments. The paper is organized as follows. Section 2 gives an overview of algorithms, initial designs, surrogate modeling, and stopping criteria for sequential designs. Section 3 classifies sequential designs in terms of deterministic/ stochastic, granularity, and modelfree/modeldependent. The importance of exploitation and exploration is explained in section 4, with a numerical example to demonstrate some sequential strategies in balancing exploitation and exploration. The last section presents a summary and discussions on future directions.
2. SEQUENTIAL DESIGN ALGORITHM
Traditional DACE is based on oneshot design approach. In DACE framework, a system is first analyzed to construct a simulator. An initial design is chosen to spread samples evenly across the experimental space. Responses are observed from the simulator when the sample design points are fed to the simulator. Next, a surrogate model is built using this data. The oneshot design approach is shown in the bluedash box in Figure 4. All data points are chosen at once in a oneshot design, and the modeling algorithm proceeds without evaluating any additional samples. Finally, one can predict the response at untried points, optimize a function of the response or tune the computer code to physical data (Sacks et al., 1989). However, the initial design will inevitably miss certain design space features. Sequential design can improve this oneshot approach by transforming the algorithm into an iterative process. The sequential design approach (orangedash box in Figure 4) uses some infill criteria to identify additional samples with the highest information value. Infill criteria play critical roles in determining the locations to add more point(s) to the existing data set. The optimal criteria may depend on several factors, including the types of surrogate models under consideration, the number of samples required, global fit vs. optimization. This sequential sampling procedure will terminate when some stopping criteria (model accuracy or computational budget) are satisfied. Finally, the approved surrogate is used for prediction, optimization, or calibration purposes. The sequential design approach is more flexible than the oneshot approach.
2.1 Initial Design
The objective of making an initial design is to obtain a good representation of the response surface. We try to evenly distribute the sample points over the design region to obtain maximum information about the response. In physical experiments, factorial designs or fractional factorial designs are commonly used to explore the response surface in the initial stage of experimentation. However, for computer experiments, spacefilling designs are preferred since the entire design space is of interest, and often a highly nonlinear response surface is to be estimated. One chooses a few levels for each factor and uses methods such as factorial designs to create a given number of runs.
McKay et al. (1979) set a key milestone by proposing the Latin hypercube design (LHD), also known as Latin hypercube sampling (LHS). LHD guarantees uniform coverage in every singledimensional projection while maintaining the properties of random uniform sampling. Consider a ddimensional design space. Divide the design region evenly into cubes by partitioning each dimension into n equal segments of length $\frac{1}{n}$. Then, arrange n sample points, ensuring that the samples contain just one point in each segment. An illustration of LHD (d = 2, n = 10) is shown in Figure 5. Each dimension has 10 bins. The samples are placed so that no row or column has more than one point. However, there are many possible choices for LHDs, and not all of them are good.
Another popular spacefilling scheme is maximin design based on geometric criteria. First, we define a notion of distance to measure the spread between two points by a simple choice of Euclidean distance:
A maximin design Xn maximizes the minimum distance between any pairs of points.
Figure 6 illustrates a maximin design in two dimensions.
Minimax distance designs (Johnson et al., 1990) achieve a maximum spread of samples by minimizing the worst distance between any arbitrary pair of points in the design space. One disadvantage of minimax and maximin designs is that they do not have good projections onto subspaces. Morris and Mitchell (1995) proposed maximin Latin hypercube designs (MnLHD), combining the pros of the two design methods. An MnLHD is an LHD that maximizes the minimum distance among the samples. It achieves spacefilling in both the full and single dimensions of the input space. With the same spirit, maximum projection (MaxPro) designs (Joseph et al., 2015b, 2016) were recently introduced to maximin distance in all subspaces of the experiment region.
The sample size has a substantial impact on the initial design. It must be large enough to guarantee minimal coverage of the design space and be as small as possible to save computational expenses. Chapman et al. (1994) and Jones et al. (1998) suggested using a sample size 10 times bigger than the input dimension, which means 10d when d is the number of input variables. Loeppky et al. (2009) investigated this 10dthumbrule and concluded that this 10d sample size is suitable for an initial design when d ≤5. Initial surface estimates were adequate for the performance of their sequential design algorithm if an initial design consumed roughly 25–35% of a fixed budget of runs (Ranjan et al., 2008).
2.2 Surrogate Modeling
A variety of modeling techniques are available for constructing surrogates. The choice of technique depends on one’s expectation of how complex the underlying response is. Response surface methodology (Box et al., 2005) and artificial neural network (Haykin, 1994) are appropriate for building fast and straightforward approximations. Other models in the literature are multivariate adaptive regression splines (Friedman, 1991) and radial basis function approximations (Dyn et al., 1986). Among these techniques, Gaussian Kriging models are most popularly used for computer experiments.
The Kriging model was derived from the geostatistics field by a South African geologist in his master’s thesis (Krige, 1951). It was proposed to model computer experiments by Sacks et al. (1989). The data set consists of n vector of input variables $X=({x}_{1},\mathrm{...},{x}_{n})$, and the corresponding output $y=({y}_{1},\mathrm{...},{y}_{n})$. The response of the computer code is modeled as a function of input vector x:
where f(x) is a set of prespecified functions (chosen basic) and β is a set of unknown coefficients. Z(x) is a stationary Gaussian process with mean zero, and nonzero variance σ^{2}, with the covariance of
where Cov is the covariance operator, σ is the standard deviation of the stochastic process. ${R}_{ij}({x}_{i},\hspace{0.17em}{x}_{j})$ is the correlation between outputs corresponding to the two samples x_{i} and x_{j} , defined as the component of the autocorrelation matrix R.
We define the distanced function for k covariates:
where θ_{l} ≥0 are unknown correlation parameters. The correlation parameters θ = (θ_{1},..., θ_{k}) are scale parameters and p = (p_{1},..., p_{k}) are power parameters.
The best linear unbiased predictor (BLUP) (Santner et al., 2018) is used to predict the response at an untried x0 :
where $r={[R({x}_{0},\hspace{0.17em}\hspace{0.17em}{x}_{1}),\hspace{0.17em}\dots ,\hspace{0.17em}R({x}_{0},\hspace{0.17em}\hspace{0.17em}{x}_{\text{n}})]}^{T}$, ${f}_{0}=f({x}_{0}),\hspace{0.17em}\hspace{0.17em}\widehat{\beta}={({F}^{\text{T}}{R}^{1}F)}^{1}{F}^{\text{T}}{R}^{1}y$, R is the n×n matrix with entries ${R}_{ij}({x}_{\text{i}},\hspace{0.17em}{x}_{\text{j}})$ and $F=[f({x}_{1}),\hspace{0.17em}\mathrm{...},\hspace{0.17em}f({x}_{\text{n}})]$ is the regressor matrix. The unknown correlation parameters θ can be estimated by maximizing:
where ${\widehat{\sigma}}^{2}={(yF\widehat{\beta})}^{T}{R}^{1}(yF\widehat{\beta})/n$
2.3 Stopping Criteria
Commonly used stopping criteria are as follows:

 Time constraints, which might depend on the project deadline or time budget.

 Computational facility resource constraint, for example, the maximum number of runs preset by the user.

 Accuracy goal, which is evaluated by comparing the actual response and the model’s predicted values. The larger the error is, the more inaccurate the surrogate is. Table 1 (Fuhg et al., 2021) introduces some error estimation methods proposed to evaluate the model’s accuracy.
3. CLASSIFICATION OF SEQUENTIAL DESIGN OF COMPUTER EXPERIMENTS
3.1 Deterministic or Stochastic
Deterministic simulations are widely used in various engineering applications. For the same set of inputs, the simulator reproduces the same outputs. Therefore, traditional blocking, randomization, and replication methods used in physical experiments are irrelevant when conducting computer experiments.
This is not the case for stochastic simulation when we intentionally introduce variance to describe the natural stochasticity of the process. Replicates may be collected at every sample to estimate the noise. Stochastic simulation can be encountered in various applications, such as optimizing call center staffing (Aksin et al., 2009) and biology (Johnson, 2008). These stochastic models are often run in two stages. In the first stage, a small number of design points are taken as samples. For the second stage, the simulations are replicated at points that have large sample variances. Kim et al. (2017) extended the minimum energy design into the batch sequential situation for a stochastic response. Schneider et al. (2017) adopted the expected improvement criteria based on the Gaussian process model for finding the maximum likelihood estimate of a stochastic differential equation.
3.2 Granularity
The sequential design of computer experiments can be classified regarding granularity, the number of sample points generated at each iteration. There are two popular sampling methods, sampling at single or multiple points, as shown in Figure 7.
The singlepoint strategy adds only one point per iteration, while the sequential batch strategy simultaneously generates two or more points for each iteration. For instance, Loeppky et al. (2010) proposed a sequential batch design that adds design points using orthogonal arraybased Latin hypercubes. Duan et al. (2017) proposed a similar method where design points can be added indefinitely.
3.3 Modelfree or ModelDependent
Sequential design can be used in two ways; one for modelfree design and the other for modeldependent design. Modelfree sequential design is directed to spacefilling in the whole experimental region without resorting to fit the model. In modelfree design, no assumption is made about which type of model is used or how the model will behave. The idea is to evenly distribute the sample points in the input design space without using the responses from previous simulations. On the other hand, in modeldependent sequential design, a model to be fitted is assumed in advance to provide criteria for answering specific questions. Modeldependent sequential design can be divided into subcategories based on the design objectives: global approximation and optimization. Sequential design for global fit emphasizes sequentially improving the accuracy of the surrogate over the entire domain so that the surrogate model can be a good replacement for the original simulator. A common goal of global fitting is to minimize the average squared error over the surface. Sequential design for optimization focuses on finding an optimum region to fulfill the optimization objective. The classifications are shown in Figure 8.
The purpose of the modelfree sequential design is to spread out points over the design space to obtain a maximum understanding of response behavior once they are observed. Some criteria used to construct oneshot designs can also be implemented in a sequential version. For example, a sequential version of maximum distance sequential design (MmDist) can start with a MmDist with a small number of samples, and each subsequent point is placed to maximize the minimum interpoint Euclidean distance. The advantage of the modelfree sequential design is that it does not need to specify the initial sample size, and subsequent samples are added sequentially until the stopping rules are satisfied. Loeppky et al. (2010) developed a sequential batch design with a distancebased strategy. Duan et al. (2017) developed a new batch sequential design called sliced full factorialbased LHD. The design has good orthogonality and onedimension projection in certain stages. Qian (2009) proposed nested LHDs and showed that they could outperform the independent and identical distribution (i.d.d.) sampling. The nested LHD with n runs has multiple layers that are also LHDs. These LHDs can generate a series of smaller LHDs with fewer runs. Nested LHDs are suitable for computer experiments involving higher and lower levels of accuracy codes. Other modifications of nested LHD are nested orthogonal LHD (Yang et al., 2014), sequential refined LHD (Xu et al., 2015), and flexible nested LHD (Chen et al., 2017). Kong and Tsui (2018) extended the nested Latin hypercube design to the situation without the sample size limitation.
Modelfree designs can be further classified according to different criteria. First, they can be categorized by which type of design is used as the initial design. Various selection methods are considered, such as distancebased, discrepancybased, and matrixbased construction methods. How many samples will be selected at a time (granularity): one point at a time or a batch? The designs can also be compared regarding the input types involved, such as qualitative, quantitative, or qualitative and quantitative. Lastly, sequential designs can be evaluated and compared with respect to spacefilling, projective, and orthogonal properties. Some recent papers for the modelfree designs of computer experiments are introduced in Table 2. In short, the modelfree approach is very flexible for nonlinear, nonparametric generic regression. It has plenty of variations based on different geometric criteria and highly customized solving algorithms.
Modeldependent sequential designs have been extensively studied for computer experiments (Christen et al., 2011;Gramacy, 2020). Lam (2008) adapted a sequential design strategy for fitting response surface models of computer experiments. Gramacy and Lee (2009) introduced flexible sequential designs of supercomputer experiments based on Gaussian process models. Joseph et al. (2015a) proposed a new deterministic spacefilling sampling method for the exploration of complex response surfaces, namely minimum energy design (MinED). The creative idea was derived from the physical analogy of visualizing design points as charged particles in a box and minimizing the total potential energy inside the box. This method employs posterior evaluation to adapt to different types of response surfaces by choosing the charge function inversely proportional to the function of interest. Kim et al. (2017) extended this work into a sequential batch situation when the number of batches to be run is known in advance. Some recent modeldependent designs are listed in Table 3.
4. EXPLORATION AND EXPLOITATION
Exploration searches the whole design space, and it helps to find key regions that have not been identified before, such as discontinuities, steep slopes, optima, or stable regions. Exploitation selects samples in regions that have been identified as interesting or potentially in need of investigation. Exploitation uses the outputs from previous iterations to guide the sequential sampling process. In this section, several exploitation and exploration criteria are introduced. A significant emphasis on a tradeoff between exploration and exploitation is stated with an illustrative numerical example.
4.1 Classification of Exploration Criteria
Exploration aims to even look over the whole input domain to gain a “general” knowledge of the mapping. Thus, the pure exploration strategy performs adaptive sampling while ignoring previously evaluated outcomes. Exploration aims to generate sample points to fill the domain.
4.1.1 Distancebased Criteria
Maximin and minimax distance
Johnson et al. (1990) proposed two distancebased criteria, namely maximin (Mm) and minimax (mM), to spread sample points within the design space, as described in Figure 9. The maximin criterion maximizes the minimum distance between two sample points.
On the other hand, the minimax criterion minimizes the maximin distance between two points:
The distance here takes the choice of Euclidean distance of two points x, x′:
The framework can also be extended to other choices of distance, such as Mahalanobis distance.
Mahalanobis distance
The Mahalanobis distance of an observation x = $({x}_{1},\hspace{0.17em}\mathrm{...},\hspace{0.17em}{x}_{N})$ from a set of observations with mean $\mu =({\mu}_{1},\hspace{0.17em}\mathrm{...},\hspace{0.17em}{\mu}_{N})$ and covariance matrix S is defined as
Voronoi tessellation
Consider a set of distinct points ${X}_{\text{N}}=\{{x}_{1},\hspace{0.17em}{x}_{2},\hspace{0.17em}\mathrm{...},\hspace{0.17em}{x}_{n}\}$ in an open bounded domain $\Omega \in {\mathbb{R}}^{d}$ . Every sample ${x}_{\text{i}},\hspace{0.17em}i=1,\hspace{0.17em}2,\hspace{0.17em}\mathrm{...},\hspace{0.17em}n$ has a corresponding Voronoi cell V_{i} given as
where d denotes the Euclidean distance in ${\mathbb{R}}^{d}$.
Crombecq et al. (2011) proposed LOLAVoronoi to combine Voronoi tessellations and local linear approximation (LOLA). While Voronoi tessellations target domain exploration, LOLA guides local exploitation. This algorithm outperformed static techniques for all surrogate model types. Singh et al. (2013) proposed three tradeoff schemes to balance exploration and exploitation for the LOLAVoronoi strategy. Van der Herten et al. (2015) introduced a fuzzy variation of LOLA (FLOLA). This algorithm provides the benefits of the original one and significantly reduces the computational burden.
Delaunay triangulation
Delaunay triangulation is the dual of a Voronoi diagram. Assume that P is a given set of discrete points in a plane and E is a set of closed line segments composed of points in P. A triangulation T=(P, E) of the point set P is a plane graph that satisfies the following conditions: 1) all faces made by the edges are triangular faces, and 2) the edges in the plane do not contain any point in set P. A Delaunay triangulation for a given set P is a triangulation DT(P) such that there will be no other points within the circumcircle of any triangle. Delaunay triangulations help to maximize the minimum angle of all the angles of the triangles in the triangulation. Voronoi tessellation and Delaunay triangulation are illustrated in Figure 10 (Fuhg et al., 2021).
4.1.2 Errorbased Criteria
The key advantage of the Gaussian processbased model is that it permits the calculation of an estimated error in the model. The search algorithm uses the estimated error to position infill points where the model prediction is most uncertain. It created a premise to the integrated meansquared error (Sacks et al. 1989). Other variations of this exploration technique can be found in Jones et al. (1998), Lam (2008), and Liu et al. (2017).
4.2 Classification of Exploitation Criteria
4.2.1 Crossvalidation Exploitation
The merit of the crossvalidation (CV) approach is that there is no need to add new samples to assess metamodel accuracy. Leaveoneout crossvalidation (LOOCV) is a particular case of kfold crossvalidation. We leave out a sample point from an initial design set D_{n} and build a surrogate with the rest of the points. The surrogate is used to predict the response at the leaveoutsample point. Then we calculate the crossvalidation measure between the predicted value $\widehat{y}({x}_{\text{i}})$ by the surrogate and the response value ${\widehat{y}}_{{}_{i}}({x}_{\text{i}})$ at the leaveoutsample:
where $\widehat{y}({x}_{\text{i}})$ is the prediction of the metamodel based on all the sample points in D_{n} , and ${\widehat{y}}_{{}_{i}}({x}_{\text{i}})$ is the prediction of the metamodel based on n  1 sample points.
4.2.2 Gradientbased Exploitation
Crombecq et al. (2011) introduced a parametric input space with Voronoi tessellation, then approximated the gradient in each cell from other neighborhood information. Local nonlinearities are evaluated from an approximation of the Lipschitz constant using neighbor points in Lovison and Rigoni (2010).
4.3 Balance between Exploration and Exploitation
In sequential design, a tradeoff must be made between exploration and exploitation. Without proper design space exploration, one might miss some interesting regions of the design space. Some hybrid algorithms that balance exploitation and exploration are introduced in the literature, such as the sequential minimum energy design (Joseph et al., 2015a), which uses a charge function. This criterion depends on the objective of the experiment since it can hold both the global fitting problem and optimization problem by selecting a suitable energy function. Crombecq et al. (2011) presents an adaptive design algorithm for exploration that puts more samples in areas with higher nonlinearity. Deciding on which point to select is based on 1) an exploration criterion based on Voronoi tessellation of the input space and 2) an exploitation criterion based on a nonlinearity measure calculated using the local linear approximation algorithm. A summary of sequential design approaches for computer experiments and their algorithms is listed in Tables 4 and 5.
4.4. An Illustrative Numerical Example
This numerical example presents a comparison between several sequential design algorithms in balancing exploration and exploitation. The test function is adapted from FarhangMehr and Aarm (2005):
The initial dataset comprises seven sample points $D=\{0,\hspace{0.17em}0.25,\hspace{0.17em}0.375,\hspace{0.17em}0.5,\hspace{0.17em}0.625,\hspace{0.17em}0.75,\hspace{0.17em}1\}$, with one hole intentionally left in the critical region to see if the sequential methods can mimic the shape of this function. The surrogate model is constructed using the Gaussian process. Twelve infill criteria are used to select new sample points. The sequential algorithm is stopped when a design budget of 30 points is consumed. The objective functions and their corresponding surrogate outputs are presented in Figure 11. The blue dotted lines imply the target function, while the red lines illustrate surrogate modeling outputs. The black dots show the locations of the total 30 sample points. The red dots indicate the last point in the sequential sampling process. It can be seen that some techniques such as WAE (Figure 11e) and EGO (Figure 11k) did not detect the critical region, which shows that finding the optimum is quite a challenging task. In these cases, the points are sampled near the lower bound. Most of the other techniques, including SFCVT (Figure 11c), CVV (Figure 11d), AME (Figure 11f), MEPE (Figure 11g), TEAD (Figure 11j), and EIGF (Figure 11l), can find the critical area but are not sufficient to exploit information in localized nonlinearity regions. LIP (Figure 11i) can find the critical region; however, for the budget of 30 runs, the sequential sampling process is stuck in critical regions and fails to obtain an overall response. ACE (Figure 11a), SSA (Figure 11b), and LOLAVoronoi (Figure 11h) can capture the behavior of the objective function exceptionally well by balancing exploration and exploitation.
5. SUMMARY AND FUTURE DIRECTIONS
A brief literature review of the sequential design of computer experiments is performed. Algorithms, initial designs, surrogate modeling, and stopping criteria are presented. Sequential designs are classified in terms of deterministic/stochastic, granularity, and modelfree/modeldependent. Criteria for exploitation and exploration are presented. Some hybrid algorithms that balance exploitation and exploration are introduced and compared with an illustrative example.
Engineers working on computer experiments can benefit by implementing the sequential design. There are a couple of issues to consider for the practical application of the sequential design of computer experiments. First, in the initial design stage, how many points should be tried and in which locations? Since the number of initial design points will increase exponentially with the number of input variables, screening designs with fewer runs to identify active variables are necessary when many input variables are considered. Latin Hypercube, maximin, and minimax distance designs do not perform well with respect to their projections onto subspaces. There is a need for sequential design methods to ensure maximum projectivity and spacefilling for the vital few variables after screening design data analysis. Second, model accuracy needs to be enhanced by sequentially integrating computer simulations with realworld field data. The computer models should be calibrated using observations from real systems, usually by physical experiments.