## 1.INTRODUCTION

Data Envelopment Analysis (DEA) has been widely discussed in both methodological and practical areas since the groundbreaking work of Charnes *et al*. (1978). DEA is a powerful mathematical method to measure the relative efficiencies of a set of similar decision-making units (DMUs) which utilize the same inputs to produce the same outputs (Seiford, 1996). DEA is now a standard technique in performance measurement for many areas (e.g., Banker *et al*., 2013; Cooper, 2013; Ogawa and Ishii, 2002; Okudan and Lin, 2009; Panta *et al*., 2013; Trinh *et al*., 2012).

In its name only, DEA is based on the observed data and it implicitly requires the inputs and outputs data should be known exactly. However, in the real environment, there are various sources of uncertainties in data (e.g., human error, technical malfunction, behavioural bias); this may lead to the stochastic nature of data (Dyson and Shale, 2010). Moreover, if the collected data are not representative or missing, the efficiency analysis may not able to be carried out completely, and the resulting efficiency scores will be erroneous and misleading. Hence, the effects and characteristics of the data is vital source of impact that engenders confidence in the results of the efficiency analysis. In order to address this issue, DEA has been enhanced to handle data uncertainties, in what being termed as stochastic DEA. Various forms of stochastic DEA have been discussed, e.g., imprecise DEA (IDEA), chance constrained DEA (CC DEA), bootstrapping, and Monte Carlo simulation.

IDEA can be applied for bounded and ordinal data; this approach is effective only when selecting the extreme values over the ranges whereas the extent of the ranges may impact the different degrees of uncertainty in the data. Despotis and Smirlis (2002) developed an interval efficiency by assuming that the efficiency of a DMU is an interval between the pessimistic and optimistic condition measures. Although this method can determine the lower bound and upper bound of the efficiency score, the bound is too wide to make reasonable evaluations to each DMU, hence this approach is more suitable for sensitivity analysis rather than to measuring stochastic efficiency (Zhu, 2003).

In CC DEA, the data of inputs are deterministic while the outputs are stochastic with a distribution; the amputation of uncertainty is determined by allowing a DMU to lie outside of the efficient frontier with a certain probability. This approach is intractable because it can only incorporate the uncertainty in outputs rather both inputs and outputs (Cooper *et al*., 1996; Olesen, 2006; Olesen and Petersen, 1995).

Bootstrapping is a method of measuring the uncertainty in DEA efficiency through generating the confidence intervals; it assumes that the current data is of the one sample set of the population which has an infinite of units by the process of data generating. This approach is able to give a specific estimation to the uncertainty of efficiency in DEA. However, as for the examined inputs nd outputs, it can not load any specific information about data uncertainty. Additionally, it is unable to handle the very real case of any observed values being more ncertain than others due to the homogeneity assumption on efficiency (Dyson and Shale, 2010; Fried *et al.*, 2008; Simar and Wilson, 1998).

Monte Carlo simulation has been applied in DEA to handle uncertainty in data by many researchers (e.g., Wong *et al*., 2008; Kao and Liu, 2009). The studies describe the uncertainty of the data with certain distributions by employing simulation method, to compare the techniques or to examine the accuracy of the efficiency scores. The challenge lies in the process of distribution construction which has the potential of computational greediness. It means that only applying Monte Carlo simulation will waste much computation time on the noncritical alternatives. This is because in the conventional simulation process, each process of collecting data has been set equally with the same number of replications. Wong *et al*. (2011) has suggested the method of optimal computing budget algorithm (OCBA), this can reduce the total computation time effectively for collecting the simulation data. Their works have shown the great potential of soft computing approach in improving the accuracy of DEA efficiency.

Apart from considering which technique to handle the uncertainties in data, it is also important to consider that what activities that needs to be done in carrying out an efficiency measurement study. Logically, before one employs a technique to measure efficiency, it needs to have a set of data for that analysis purpose. This means, one needs to ‘collect the data.’ With regards to this, let’s use the lay man term ‘data collection’ to denote this activity. In the realm of efficiency measurement, *data collection *is the utmost important, yet most time consuming step (e.g., various kinds of *inputs *and *outputs *data for each DMU have to be collected). Here, we need to address that “*How should we allocate the computation budget for simulation data collection?*”

Therefore, we aim to springboard from this paper, an alternative approach for efficiency measurement. This can be viewed as an upcoming generation of stochastic DEA, which, instead of concentrating on the downstream of the technique (i.e., how to calculate the end results, i.e., efficiency), here we looked at the upstream modus operandi that is the data collection part. This data collection part obtains the data, which eventually feeds into generating the end outputs, i.e., efficiency. This in an innovative approach as it stimulates the thinking of ‘out of the box.’ In addition, the final result of the proposed technique is the improvement in the accuracy of the efficiency. This idea utilized simulation optimization technique, and this method is known as Data Collection Budget Allocation Data Envelopment Analysis (DCBA-DEA). The paper will proceed in the following manner. First, we will explain about DCBA-DEA and the technical aspect of it. Then, we show a simple illustration on the application of DCBA-DEA. And finally, this paper culminates with the important insights that can be observed through this idea.

## 2.THE PROPOSED IDEA

Originated from the basic DEA model, the DCBADEA aims to provide an accurate measurement of the efficiency when the data are uncertain through the allocation of data collection effort. It is a 2-in-1 approach, where the first step is to obtain the efficiency scores, and then, followed by step two, that is to improve the accuracy of the efficiency scores. This approach can be used to handle uncertainties in data. First, we provide a short description of DEA (Cooper et *al*., 2000; Thanassoulis, 2001).

### 2.1.Step 1: DEA-Based Efficiency Measurement

Assume that there is data on *S* inputs and *R* outputs for each *J* DMU. Let *K* denotes the set of combined inputs/outputs, i.e., *K* = *S* ∪ *R*. D as the total number of inputs/outputs), *K* = {1, …, *D*}. For the jth DMU, these are represented by the vectors *x _{j}* and

*y*, respectively. Let

_{j}**X**

_{D}= (

*x*)

_{kj}*k*∈

*K*;

*j*∈

*J*, where

*x*represents

_{kj}*k*th input/output for DMU

*j*. If

*k*∈

*S*, then

*x*is an input; otherwise if

_{kj}*k*∈

*R*, then

*x*is an output. To measure the efficiency for each DMU, we calculate a ratio of all inputs/outputs, such as (

_{kj}*u*) where

_{k,k∈R},x_{kj,k∈R}/u_{k,k∈S}x_{kj,k∈S}*u*is a

*K*×1 vector of weights; i.e., if

*k*∈

*R*, then

*u*is an output weight; otherwise if

_{k}*k*∈

*S*, then

*u*is an input weight. To select optimal weights we specify the following mathematical programming problem:

_{k}The above formulation has a problem of infinite solutions, by imposing a constraint (uk,k∈S xkj,k∈S ) = 1, this will lead to the equivalent envelopment form as follows (2) after transformation using the duality in linear programming.

Where *θ* (**X**_{D}) is a scalar representing the value of the efficiency for the joth DMU which will range between 0 and 1. *λ _{j}* is a vector of

*J*×1 transformed weights (decision variables) of the inputs/outputs that optimize the efficiency score of DMU

*j*.

_{o}The above DEA model (2), also known as the constant-return-to-scale model, has to be solved J times, once for each DMU in the sample. In order to calculate efficiency for a group of comparable DMUs (e.g., inefficient DMU is only compared against DMU of similar size), and therefore provides the basis for measuring economies of scale within the DEA concept, a convexity constraint $\sum _{J\in J}{\lambda}_{j}$ is added. This is called the variablereturn-to-scale model. The convexity constraint determines how closely the production frontier envelops the observed input/output combinations.

The issue in this step is that it does not cater for uncertainties in the data. For example, if one does not have the full set of data, then the respective DMU or the related data category will have to be omitted in the analysis (this defeats the overall purpose of the study). An alternative measure, one may resort to guessing the data; in which, this will render the efficiency score unreliable at the end. Using averages of the data (or mean) may also pose a problem to the accuracy of the data if the sample size is too small, as this may not fulfil the central tendency theorem, i.e., the sample means will be far away from the true mean values.

In view of this, we aim to address the uncertainties of data in efficiency measurement by using the simulation optimization concept, i.e., in our step 2.

### 2.2.Step 2: Enhancement for Accuracy – DCBA

*“How to allocate the budget effectively in order to obtain an accurate efficiency score”* is the main principle in this step. The distribution of the efficiency score, which ultimately identifies the tendency of where the true efficiency values normally lies, are determined by the amount of data collected on the inputs/outputs. In reality, data collection is expensive and subject to budget constraint. In this respect, if the data collection effort is done naively, the entire efficiency study will be jeopardized. For example, if the collected data are not representative, then the resulting efficiencies will be erroneous and misleading. Hence, the issue of how to allocate the budget effectively as to which type of data to collect and how many data to collect are therefore of prime importance in efficiency study.

#### 2.2.1.Conceptualization based on simulation optimization

The issue of “*How to allocate the budget effectively*?” has been addressed in the field of simulation optimization, where OCBA algorithms has been developed to determine the simulation replications to be allocated to each simulation model in order to identify the best design using the least amount of computing budget (Chen and Lee, 2010; Lee *eta al*., 2013). Here we employed the concept of this technique to develop our DCBA model. From the lessons learned in OCBA, we know that if we naively allocate data collection effort fairly, we may have wasted the effort in collecting data that are not sensitive to efficiency and the accuracy of the estimated efficiency may not improve. Therefore, with this aim, we want to find the best data collection plan (*design*) in terms of best use (*allocation*) of resources (budget) that gives the most accurate efficiency score.

*The DCBA model*

The above is our DCBA model derived from the OCBA concept. The accuracy of the efficiency is measured through mean square error (MSE). The objective function in the above model, *F*(**n**) is defined as the MSE of the efficiency score for allocation design **n** where **X’** is the belief of the inputs/outputs after additional data are collected following the allocation design **n**; *N* is the total computing budget (data points) for allocating to *k* variables (the number of stochastic inputs and outputs). Note that *θ* (**X**) is the efficiency score computed using (2) (i.e., *θ* (**X**_{D}), for simplicity, we discard the notation D from **X**_{D}) and *θ*(**X**′) represents the belief for the true efficiency.

Note that the model cannot be solved directly, as the solution is not in closed-form formula. Inherently, the above problem is non-convex discrete optimization problem where there is no good structures exist for us todevelop simple efficient algorithm. One way to estimate *F*(**n**) is to derive **X’** through the Bayesian framework. Through the quantification of **X’** using Bayesian, *F* can be estimated for a given value of n as follows.

**X’**in the replication

*i*of the Monte Carlo run for allocation design

**n**and

*M*is the cardinality of random data set. This method thus is a

*simulation based*technique. Recall that an allocation design is given as

**n**= [

*n*]

_{i}_{i∈K},

*K*is the combined inputs and outputs,

*D*is the total number of inputs/outputs,

*N*is the total number of data collections, and

*K*= {1, …,

*D*}. As the search space is discrete and very huge, e.g., for

*D*= 15,

*N*= 150, the possible number of

**n**(the allocation data collection plan) can be as high as 6.60×1019. If we evaluate every design in the search space, assume each design needs 2 seconds to be searched and 3 seconds to be evaluated, all designs may need 3.3×10

^{20}seconds to finish, that is to say, it will cost 1.04×10

^{13}years to finish.

In order to reduce the time for searching the design, two search-based methods have been presented by Wong *eta al*. (2011). The first is a sequential approach to collecting data, whereby a hill-climbing technique is used to gradually improve the solution and moves towards *N* starting from an initial value of zeros for all *n _{k}*. The second technique is a non-sequential data collection approach whereby a metaheuristic technique (genetic algorithm) is employed to find the optimal solution from a given pool of possible solutions, i.e., starting from the set of possible

*n*values that formed

_{k}*N*. For more details on how to enhance the model, readers are advised to refer to Wong

*et al*. (2011).

#### 2.2.2.Allocation of data collection plan using simulation

optimization technique In order to get the best design (data collection plan), we need to evaluate the designs after the searching design process. While in this design evaluation process, there still exist time wastes if we equally allocate the computing budget for each selected design. Hence, we apply OCBA (Chen and Lee, 2010), a simulation optimization technique in evaluating and choosing the best design with smallest MSE. Next we illustrate on how OCBA helps to reduce the simulation time in the process of design evaluation, as it can intellectuality choose to spend much more time and effort in evaluating a design which has potential to get lower MSE. Here, we select the non-sequential data collection plan as it is much simpler for explanation purpose.

To run the OCBA mode, additional notations required are as follows:

*T*_{ni} = number of simulation replications allocated to design **n**_{i};

${\overline{B}}_{{n}_{i}}$ = sample mean of the MSE for design **n*** _{i}*;

*E*_{n}_{i} = variance of the MSE for design **n**_{i};

*m* = number of top designs to be selected; and

*H* = total simulation runs (or computing budget, i.e., used to evaluate the designs).

We limit the size of **n** to *l* in each simulation run. From literature of OCBA (Chen and Lee, 2010), the computing allocation budget (or simulation runs) for each design can be determined through the relationship (5) below.

Where
${G}_{{n}_{i}}={\stackrel{\_}{B}}_{{n}_{i}}-\left.{\left({\stackrel{\_}{B}}_{{n}_{i\_m}}+{\stackrel{\_}{B}}_{{n}_{i\_m+1}}\right)/2}^{2}\right),{\stackrel{\_}{B}}_{{n}_{i\_m}}\mathrm{and}{\stackrel{\_}{B}}_{{n}_{i\_m+1}}$ are the sample means of the MSE for design **n**_{i} in the *m* and *m*+1 top number of designs selected, respectively.

We first simulate all *l* designs with *t*_{o} replications. As simulation proceeds, the sample means and sample variances of each design are computed from the data already collected up to that stage. The simulation budget is then increased by Δ and Eq. (5) is applied to determine the new simulation runs allocation. Further simulation replications are then performed based on the allocation and the procedure is repeated until the total runs H is exhausted. The following shows the procedure of OCBA-m allocation.

Step 1: *Initialize*. Set *t* = 1 and perform *t*_{o} simulation replications for all designs.

Set ${T}_{{n}_{1}}^{t}={T}_{{n}_{2}}^{t}=\cdot \cdot \cdot ={T}_{{n}_{l}}^{t}={t}_{o}$

Step 2: *Update*. Calculate , ${\overline{B}}_{{n}_{i}}$*E*_{n}_{i} and *G*_{n}_{i} for *i* = 1, …, *l*.

Step 3: Allocate. Increase the computing budget by Δ and calculate the new budget allocation ${T}_{{n}_{1}}^{t+1},{T}_{{n}_{2}}^{t+1},\cdot \cdot \cdot ,{T}_{{n}_{l}}^{t+1}$ according to (5).

Step 4: Simulate. Perform additional
${T}_{{n}_{i}}^{t+1}-{T}_{{n}_{i}}^{t}$ simulations for design **n*** _{i}* for

*i*= 1, …, l.

Step 5: Termination. if ${\sum}_{i=1}^{l}{T}_{{n}_{i}}^{t}$ < *H* set *t* ← *t* + 1 and return to step 2; otherwise, stop.

To summarize the 2-in-1 (DCBA-DEA) approach is used to address efficiency measurement in the presence of uncertainties of data, whereby the first step is to measure the efficiency, followed by second step, which is the budget allocation procedure for data collection to improve the accuracy of the efficiencies.

## 3.NUMERICAL EXAMPLE

Consider an example in Despotis and Smirlis (2002). There are five DMUs using two inputs to produce two outputs, both inputs and outputs are interval data. The original data are shown in Table 1 (in this paper, we use [L, U] to indicate the range, where L represents lower bound and U means upper bound). To conform to the formulation of this paper, we add the average for each interval data.

In order to run the simulations, we set the parameters in DCBA model as follows:

*D = *4, which represents the total number of inputs and outputs;

*N *= 30, which means the total number of data collections;

*t*_{0} = 10, which represents the initial simulation replications;

*l *= 20, which represents the number of designs been evaluated for each simulation run;

*H *= 200, which means the total simulation replications

or computing budget;

Δ = 20, which is the computing budget increasing per each MSE evaluation.

Despotis and Smirlis (2002) only calculated the bounds of the efficiency score for this example, and their efficiency ranges are too wide to make one DMU classify from another. Our result shows narrower ranges of efficiency score, the average improvement in the accuracy of efficiency ranges is above 85%. More importantly, our method not only can get the statistic information for the efficiency, but it also can find the best simulation design with time reduction and direct to the real data collection. As a result, the efficiency scores for DMUs A, B, and D lie in the ranges of [0.515, 0.572], [0.705, 0.808], and [0.666, 0.714], which have the accuracy improvement in narrow ranges with 92.65%, 86.68%, and 89.61%, respectively. Take DMU A as example, under the assumption of normal distribution, Figure 1

In this sample example, the number of possible designs is only 5456, so we apply the DCBA-DEA without the search-based method. We code them in MATLAB 7.0 and ran them in an operation system of Windows7 (CPU 1.6 GHz, RAM 2 GB). After the simulation, a feasible solution *β* = [n1, n2, n3, n4] represents the additional number of data to be collected for the four variables (input1, input2, output1, output2), respectively. Meanwhile, to avoid the computing chanciness to impact the simulation result, the entire process is executed 200 times to obtain the standard deviations and confidence intervals of the efficiency scores. The left part of Table 2 shows the results from the Despotis and Smirlis (2002), and the right part of Table 2 shows the result by DCBA-DEA. shows the statistics information of efficiency for DMU A, the shadowed part visually represents the range of efficiency score by DCBA-DEA is much more narrower compared with Despotis and Smirlis (2002).

Notably, the efficiency scores for DMUs C and E are unity, despite that the data are stochastic. This is because the whole ranges of the interval data of these two DMUs lie on the efficient frontier. Meanwhile, the best data collection design with MSE for simulation have been presented in Table 2, we find that the narrower is the efficiency range, the smaller is the MSE. This indicates that the mean square error, which has been chosen to be a numerical measure of accuracy level for efficiency score in the DCBA-DEA mode, is verified as a critical factor of reference value for the level of accuracy. We also find that the DMU with narrower efficiency range has got lower standard deviation; in the statistics, the standard deviation represents the closeness of simulation efficiency to the mean efficiency and hence it can express the stability of the simulation process. That is to say, the lower standard deviation represents the higher reliability of the result. In this example, even for the DMU B that has the highest standard deviation with the value of 0.0374, is only take 4% of its mean efficiency. Nevertheless, this example only has five DMUs with four variables; all the possible designs can be evaluated without the search-based method. Next, we will illustrate the more complex problem in real environment.

## 4.AN ILLUSTRATION ON BANKING INSTITUTIONS

Efficiency measurement of banks is chosen as the scope of study. The reason of choosing this field is that, efficiency measurement of banks often has to compile yearly data; and the problem with this is most data are not complete. Sources of this uncertainty are commonly due to the human error in managing data and also the culmination effects of change in government policy, e.g., merging of banks, which lead to difficulties in data separation and integration. Conventionally, when there are observations of multiple years’ data, the averages across the years are calculated to represent the tendency of the data and these averages are used to measure the efficiency. In this section, we will use our method to analyze the data of 25 Taiwan commercial banks during 1997–2001 from Kao and Liu (2009). Table 3 shows the five-year data with average, lower bound and upper bound for each bank, where the monetary amounts are billion New Taiwan dollars (100 NTD ≅ 3 USD). This data has 25 DMUs with 6 variables; therefore, it has a huge search space for all possible data collection design even for small *N*. In order to simulate the data sufficiently, we set *N *as 300 and add the genetic algorithm to help to search the best data collection design with smallest MSE. Next we will compare our results with theresults from conventional method and the simulation approach by Kao and Liu (2009).

In conventional studies, when there have multiple observations in periods, normally the average data are used to measure the efficiency. The second column of Table 4 shows the efficiency scores of 25 banks from the average data. There are five banks (banks 4, 11, 13, 14, and 23) are efficient. Banks 7 and 8 have the smallest efficiency scores of 0.6291 and 0.6534, respectively. The simple method only use average data, this can make the measurement of efficiency easier and gives a general view of each bank’s performance in efficiency; while it may produce erroneous results. The method of interval efficiency has been discussed in the numerical example and the efficiency may be too wide to draw conclusions, so here we do not list the result of interval efficiency, instead, we put the result of a simulation method from Kao and Liu (2009), the middle part of the Table 4 shows the result of their approach, the rights part of the Table 4 shows the result of DCBA-DEA. To note, the number of simulation replications is set as 2000, this is because based on the analysis of Kao and Liu (2009), they indicate that 2000 simulation replications are sufficient to produce a result close to the true distribution.

From Table 4, we can find that in the method of using average data, banks 4, 11, 13, 14, and 23 are efficient, while in DCBA-DEA, we have missed the banks 13 and 23 for the lower bound of the efficiency scores are less than one. For those with the same upper efficiency bound (e.g., banks 3, 19, and 25), we are still unable to rank them based on their lower efficiency bound, we should know the distribution of their efficiency scores in the statistical propose. If two banks have almost the same efficiency ranges and the means, we should check the standard deviations and rank the one with smaller standard deviation as a better performance. The reason is just as we have discussed in the numerical example, the smaller standard deviation show more stability of the result.

Meanwhile, the numbers from Table 4 indicate that the average difference between the mean efficiency of each bank calculated from the simulations (column 7) and the corresponding efficiency calculated from the average data (column 2) is 0.019, which is approximately 2% of the average efficiency of the 25 banks. A Wilcoxon signed-ranks test shows that the efficiency scores obtained from two approaches are significant different at 0.05 level. While from Table 4, it can be observed that the rankings of the 25 banks are not so different although their efficiency scores are from different approaches. In fact, there are 12 banks whose ranks from 3 three approaches are exactly the same; in addition, there are only three banks (19, 22, and 25) with larger difference.

To note that from the perspective of efficiency ranges, our result is a little different from Kao and Liu (2009)’s, e.g., for bank 14, their efficiency range for bank 14 is [0.8429, 1.000] with mean efficiency of 0.9996 while in DCBA-DEA, bank 14 is always efficient. To note that the mean efficiency of bank 14 of Kao and Liu (2009) is 0.9996 while the lower bound is 0.8429, this indicates that their result is much wider than ours. Such cases also appear in the efficiency measurement of other banks. This is because in our method, the efficiencies can be better estimated by collecting more relevant data, through an intelligent data collection plan, and DCBADEA helps to materialize this. Compared with Kao and Liu (2009), the data collection plan and the improvement of the reducing standard deviation and narrowing efficiency ranges are shown in Table 5.

From Table 5 it shows that the comparisons of statistic information with ranges and standard deviation from the method of Kao and Liu (2009)) and DCBADEA. However, the efficiency scores from the latter method are more accurate than the former. This is reflected in the improvement in efficiency ranges and standard deviations have been reduced greatly. The 25 banks experience about the average reduction of 65.6% in efficiency ranges. Take bank 8 as an example, its efficiency interval is [0.5575, 0.7805] (range = 0.2230) from the approach of Kao and Liu (2009), and it becomes [0.6362,0.6637] (range = 0.0275), which represents and improvement of about 84%. This implies that the true mean values of the efficiency scores are being estimated more accurately. Meanwhile, the standard deviation, as a significant indicator of simulation stability, has been greatly reduced by 66.2% in average. As we know from statistics, a smaller standard deviation can predict the efficiency scores into narrower ranges. Here for instance, the bank 8, its standard deviation has been reduced from 0.0292 to 0.0046, this means an improvement of standard deviation reduction of 87.67% by applying DCBA-DEA, In addition, the value of MSE, this is as a measurement of accuracy level in efficiency scores, appears at the third digit after the decimal point for all the banks.

For the conventional model that uses the average data to estimate efficiency, and each efficiency score is a discrete value. While for the stochastic models, apart from the mean efficiency scores, the standard deviation, the efficiency ranges (minimum and maximum values) can be obtained from the results as well. What is more important, the approach of DCBA-DEA which can make the efficiency scores have a better estimation by collecting more relevant data intelligently, this is because that the data collection designs can direct the Monte Carlo simulation more sufficiently when compared with the normal simulation approach (e.g., Kao and Liu, 2009). When it comes to the reality, as user want to do future decision-making base on the estimation of the existing information (e.g., efficiency). A smaller standard deviation will give user a better prediction of company operations, this is due to the inputs (e.g., cost) can be limited in a smaller ranges to avoid the waste of cost. Then it will help the company to save cost and improve business efficiency. In addition, the results from DCBA are useful for managers to know how to effectively allocate the budget for data collection in order to maximize the accuracy of efficiency, then they can conveniently judge the performance optimistically (maximum values), moderately (mean values), or conservatively (minimum values) based on their experience, expertise, and judgment. Moreover, the data collection design, i.e., in Table 5, can express the randomness of the stochastic variables, the larger of the number indicate that this variables are more stochastic than the other, and hence it need more simulation runs to estimate a value which is more close to its true value. Further, these data collection plans even can direct the real data collection process if more information (e.g., the objective, the total data collection plan, budget, and time) can be added.

## 5.CONCLUDING REMARKS

This research developed a method known as the DCBA-DEA for measuring efficiency for DMUs in a more practically feasible way. The proposed method starts by measuring the efficiency scores, after which it improves the accuracy of the efficiency through data collection. The effort in data collection is allocated intelligently using the simulation optimization technique.

The proposed method was designed to tackle the limitations of the conventional efficiency measurement modus operandi. To that end, the first salient point is that efficiency scores are obtained in a more confident manner, as to which even the decision makers are uncertain about the data, the efficiency scores can still be estimated accurately and performance analysis can be conducted smoothly. The second point to consider is that the proposed method, when collecting the data, one will be able to allocate the effort intelligently, eliminating wastes of resources and not necessities in collecting data which are not necessary that might otherwise result in misleading efficiencies.

On the limitation, we admit that the experiment in Sections 4 and 5 was not a full-scale application, but rather a partial-scale study utilized just to illustrate the proposed method. The application has been simplified and various considerations need to be taken into account for a fully justifiable DCBA-DEA application. Nevertheless, it is hoped that, this paper could be the breaking point for a new direction in efficiency measurement research, i.e., to employ ‘thinking out of the box’ approach for more innovative and creative ideas.