1. INTRODUCTION
Options are one of the most liquidly traded derivatives in financial markets, and have received considerable attention in financial markets because they can be used for speculating and risk hedging with a relatively limited budget. Investors should know the fair price of options that they want to purchase or write. Various studies have developed stochastic models to determine the fair price of options since the first option pricing method was proposed based on geometric Brownian motion (Black and Scholes, 1973).
Stochastic processes that describe the behavior of underlying assets have been proposed because the geometric Brownian motion model could not reflect stylized facts, including fattailed returns and volatility smile. A few researchers proposed stochastic volatility models (Heston, 1993), which assume that volatility has its own stochastic process to reflect the stylized facts. Other studies considered jump diffusion processes (Merton, 1976;Madan et al., 1998;Carr et al., 2003) that allow jumps in return processes. Thus, the sudden changes in return due to the jumps cause the stylized facts in real markets. However, most previous stochastic processes cannot be directly employed for pricing of various types of derivatives, including American type derivatives, which can be exercised at any time prior to and including their maturity, and liquidly traded in real derivative markets. Thus, several methods to price American type options with stochastic processes have also been suggested. Several tree models have been developed for geometric Brownian motion (Cox et al., 1979;Hull and White, 1993;Boyle et al., 1989) and General diffusion processes (Nelson and Ramaswamy, 1990). Finite difference methods have also been proposed for the geometric Brownian motion (Schwartz, 1977) and jump processes (Carr and Madan, 1999;Hirsa and Madan, 2004;Cont and Voltchkova, 2005). In addition, Monte Carlo simulation methods (Boyle, 1977;Broadie and Glasserman, 1997;Longstaff and Schwartz, 2001) have also been employed for American option pricing.
For several decades, machine learning models have been extensively applied to estimate and forecast financial variables due to their ability to substantially t the models to given data. A few of those results have focused on stock markets (Chen et al., 2006;Son et al., 2012;Ticknor, 2013;Liao and Chou, 2013;Hafezi et al., 2015;Laboissiere et al., 2015;Chandra and Chand, 2016;Park et al., 2016;RosasRomero et al., 2016;Hussain et al., 2016) with satisfying results. Machine learning methods have also been applied to fixedincome (Kim and Noh, 1997;Cao and Tay, 2003), foreign exchange (Osuna et al., 1997;Bhattacharyya et al., 2002;Gerlein et al., 2016), credit (Huang et al., 2004;Lee, 2007;Ravi et al., 2008;Gündüz and UhrigHomburg, 2011;Kim and Ahn, 2012;Son et al., 2016;Cardoso et al., 2016) markets, and cryptocurrency (Jang and Lee, 2018). Several studies have applied machine learning methods to determine the option prices. Hutchinson et al. (1994);Choudhury et al. (2014) applied neural network (NN) models to S&P 500 future index options and found comparable results to the Black Scholes formula. Gaussian process regression has been employed to determine the price of the equity linked warrants (Han and Lee, 2008), which are a type of options where investors can only have long positions, and compared the results with several NN models. Yang and Lee (2011) estimated the implied volatilities of options using the Gaussian process regression and computed the prices using them. Park and Lee (2012) proposed positive Gaussian process, the predictive distribution of which only locates the positive domain area to ensure that the predictive option prices constantly have nonnegative values. In Park et al. (2014), several machine learning methods, including NN, Gaussian process, and support vector regression, have been directly compared with several parametric models, such as the Black Scholes, Heston, and Merton models. Most studies that apply machine learning algorithms to option pricing problem have focused on European style options, which can be exercised only at maturity. Pires and Marwala (2004) compared NNs and support vector machines were for the American option pricing problem and determined that support vector machines performed better than NNs. However, these machine learning methods have a limitation that they do not provide arbitragefree prices for options. Thus, solutions from machine learning methods do not agree with the financial theories.
NN models, including deep learning models which use more than one hidden layer, have spread extensively due to their superior performance in sophisticated problems compared with other machine learning algorithms, including support vector machines, with the recent development of computing power. Successful application areas of NN models include image classification (Krizhevsky et al., 2012), handwritten digit recognition (Ciregan et al., 2012), speech recognition (Graves et al., 2013), and text mining (Tang et al., 2015). NN models have also been applied to financial applications. The US stock market index has been predicted with deep convolutional network using events extracted from news as input variables (Ding et al., 2015), and the nextday financial trend has predicted by neural networks using Twitter moods (Huang et al., 2016). A coupled deep belief network was proposed to describe couplings in financial markets (Cao et al., 2015). Yeh et al. (2015) employed a deep belief network to predict corporate defaults and determined that the deep network model they used performed better than other machine learning methods. However, to the best of our knowledge, none of these NN models have considered arbitragefree pricing for financial derivatives.
This study proposes an American type option pricing method by using arbitragefree pseudo inputs via multilayer NNs. The proposed method first calibrates the parameters of a parametric model, thereby explaining the behavior of the underlying assets in the market. Thereafter, the pseudo inputs generated by the calibrated parametric model are sampled for grid points of the input domain. Then, the NN model is trained with both the training inputs and generated pseudo inputs. Consequently, the trained model determines arbitragefree prices or prices near the American options for all exercise prices and time to maturity.
The remainder of this paper is organized as follows. In the next section, we describe the proposed method to determine arbitragefree option prices with NN models by generating pseudo inputs from arbitragefree parametric models with its algorithm. Then, we present the empirical results with the real American options of the S&P 100 index by applying the proposed method to NNs. Finally, we conclude the study and discuss directions for future research.
2. PROPOSED METHOD
This section describes the proposed method to construct the NN model that determines noarbitrage option prices using pseudo inputs. The proposed method comprises sampling pseudo inputs from the arbitragefree parametric models and learning a multilayer NN with sampled pseudo inputs, as well as training inputs obtained from trades in the real markets. The proposed method procedure is summarized in Algorithm 1 and the details of each step are explained in the following subsections.
A1. Initialization

1. With data points collected from real market trades, $D={\{{x}_{k,}\hspace{0.17em}{y}_{k}\}}_{k=1}^{N}$, select a parametric underlying process with parameters λ and American type option pricing formula C(x,λ) corresponding to the selected parametric process.

2. Set the possible minimal and maximal time to maturity (TTM) and moneyness, m_{TTM}, M_{TTM}, m_{money}, and M_{money}, based on the collected dataset.
A2. Sampling pseudo inputs from the arbitragefree parametric models

1. Calibrate the selected parametric model with given data, i.e. find $\widehat{\lambda}={\text{argmin}}_{\text{\lambda}}{\displaystyle \sum}_{k}{\left\leftC\left({x}_{k},\lambda \right){y}_{k}\right\right}^{2}$.

2. Generated twodimensional grid points ${\tilde{x}}_{i},\hspace{0.17em}i=1,\hspace{0.17em}\mathrm{...},\hspace{0.17em}\tilde{N}$ by varying TTM, from m_{TTM} to M_{TTM}, and moneyness, from m_{money} to M_{money} .

3. Find arbitragefree prices of generated grid points using the calibrated option pricing model, ${\tilde{y}}_{i}={\tilde{y}}_{i}=C({\tilde{x}}_{i},\hspace{0.17em}\widehat{\lambda}),\hspace{0.17em}i=1,\hspace{0.17em}\mathrm{...},\hspace{0.17em}\tilde{N}$, and construct a dataset of pseudo inputs $\tilde{D}={\{{\tilde{x}}_{i},\hspace{0.17em}\tilde{y}\}}_{i=1}^{\tilde{N}}$.
A3. Learning arbitragefree neural network

1. Set the neural network model parameters, such as the number of hidden layers and the number of neurons for each layer.

2. Train the constructed neural network, f, with the backpropagation algorithm by minimizing $\frac{1}{2}({\displaystyle {\sum}_{k=1}^{N}{\eta}_{1}\left\rightf({x}_{k}){y}_{k}{}^{2}+}{\displaystyle {\sum}_{i=1}^{\tilde{N}}{\eta}_{2}\left\rightf(}{\tilde{x}}_{i}){\tilde{y}}_{i}{}^{2})$.
2.1 Sampling Pseudo Inputs from ArbitrageFree Parametric Models
Arbitragefree pseudo inputs are required to construct the proposed arbitragefree NN model. We first select one of the existing arbitragefree parametric models and calibrate the selected model with training instances collected from the real market. Thereafter, pseudo inputs for the calibrated model are generated by varying the time to maturity (TTM) and moneyness, which are known major factors to decide option prices. The values of these factors can be chosen by traders unlike other factors, such as interest rates and market volatility, which are often given exogenously. TTM is defined as the remaining time to maturity of an option from today and moneyness is defined as the ratio between the current price of underlying asset and the exercise price of an option.
The expected effects of these pseudo inputs are twofold. First, the output values (i.e., option prices) of these pseudo inputs are arbitragefree prices. Thus, NN models trained with them will be forced to pursue arbitragefree pricing. Second, the proposed NN models can perform better than models that only use training examples from market trades, because real trades are concentrated on a certain range of moneyness and TTM. Thus, learning models trained only with trade records may fail to predict option prices that are considerably outside of that region. However, if pseudo inputs with attributes outside the region where the trades are concentrated are used for training the NN model, then the trained model can determine reasonable option prices for the inputs that scarcely exist in trades in real markets, as well as inputs that lie on the concentrated region.
2.1.1 Calibrating a Parametric Model
We select and calibrate one parametric process to generate arbitragefree pseudo inputs. Unlike European type options, the American type are seldom represented as closed form solutions for several types of parametric distributions. Therefore, several approaches have been developed to determine American option prices with given stochastic process. Typical examples are tree models (Cox et al., 1979;Hull and White, 1993;Boyle et al., 1989;Nelson and Ramaswamy, 1990), Monte Carlo methods (Boyle, 1977;Broadie and Glasserman, 1997;Longstaff and Schwartz, 2001), and finite difference methods (Schwartz, 1977;Kwon and Lee, 2011).
Once we find the option pricing formula, C(x,λ) , with parameters λ corresponding to the selected parametric model, though it cannot be represented as an explicit form, we can calibrate the parameters by minimizing the empirical error between the true option prices and the predicted ones as
where λ refers to the vector of all parameters appearing in the selected parametric models and x_{k}, k = 1, …, N and y_{k}, k = 1, …, N are true inputs and option prices observed from real market trades, respectively.
2.1.2 Generating Pseudo Inputs
The next procedure for the proposed method is to generate arbitragefree pseudo inputs from the calibrated model. First, we determine the twodimensional grid of TTM and moneyness with the given appropriate maximal and minimal values M_{TTM} , m_{TTM} , M_{money} , and m_{money} . Thereafter, we determine the grid points, ${\tilde{\text{x}}}_{i}$ , and calculate their output values ${\tilde{y}}_{i}=C\left({\tilde{x}}_{i},\widehat{\lambda}\right)$ with calibrated parameters, $\widehat{\lambda}$. Finally, we combine the generated pseudo inputs and outputs as a dataset $\tilde{D}={\left\{{\tilde{x}}_{i},{\tilde{y}}_{i}\right\}}_{i=1}^{\tilde{N}}$ . The effect of pseudo inputs on the final model can control how fine the grid is made where finer the grid increases the pseudo input effects.
These pseudo inputs facilitate the construction of the arbitragefree learning models and alleviate data sparsity problem as generating inputs for the region where real trades do not exist. Figure 1 shows the traded S&P 100 index American putoption prices on January 3, 2012. Note that the traded options are concentrated on specific TTMs and that moneyness is near 1. The proposed method fills the empty areas with generated pseudo inputs to obtain a robust model with high performance.
2.2 Learning ArbitrageFree Neural Network
To construct the proposed arbitragefree multilayer NN, we trained the NN model, i.e., found the weights with pseudo inputs as well as the real inputs to minimize the cost
where η_{1} and η_{2} are weights for each samples of the real market trade dataset and the generated pseudo input dataset, respectively, using backpropagation where the weights were found from the final output layer to the first input layer. The detailed procedure follows.
Suppose that the number of total layers is L+1 , where index 0 denotes the input layer (neurons have input values), index L denotes the output layer (neurons have the final output values), and indices 1 to L−1 denote hidden layers between the input and output layers. Let ${w}_{pq}^{l}$ be a weight for the pth neuron of the l−1 th layer to the qth neuron of the lth layer, ${x}_{q}^{l}$ be the input of the qth neuron in the lth layer, and ${o}_{q}^{l}$ be the output of the qth neuron in the lth layer. Then
and
where b is an activation function.
Backpropagation aims to find the partial derivative $\frac{\partial {\epsilon}_{k}}{\partial {w}_{pq}^{l}}$ where ${\epsilon}_{k}=\frac{1}{2}{\eta}_{j}{{\displaystyle {\sum}_{d}\left({y}_{d}{o}_{d}^{L}\right)}}^{2}$ is a cost for a single kth input, and j equals 1 or 2 to denote the input is from real market trades or the calibrated parametric model, respectively. For the output layer (l = L) the partial derivative can be expressed as
whereas l < L, the partial derivative is
which can be found when $\frac{\partial {\epsilon}_{k}}{\partial {o}_{p}^{l+1}}$ is known for every p, and the other partial derivatives can be computed from (2) and (3). Therefore, if we compute the partial derivatives backwardly, i.e. from the output to the input layer through the hidden layers, the partial derivatives in (5) can be computed for every l and q.
Finally, the backpropagation algorithm updates the weights using the gradient descent algorithm,
where ζ is a given learning rate, and the procedure terminates when one of the stopping criteria is satisfied.
The proposed model has two main advantages compared with conventional multilayer NN models that are trained only with real option market data. First, the proposed model is trained with arbitragefree pseudo inputs as well as the inputs from the market, thereby pursuing arbitragefree solutions more strongly than conventional NN models. In contrast, conventional models use only a small number of real market trade data points without arbitragefree assumptions. Second, the proposed method is more robust to data imbalance, generating pseudo inputs for coarse regions, where real trade data are sparse or do not exist, compared to conventional models which only use real data points as training inputs. Therefore, the proposed method can better predict option prices of the inputs in coarse regions.
3. EMPIRICAL RESULTS
We applied the proposed method to real American option data and compared the results with a conventional multilayer NN model with the same structure to validate the predictive power of the proposed method. We first describe the data used for the simulation and the settings for the proposed method, then present and discuss the predictive performances of the proposed and conventional models.
Data Description and Settings
We used American option data for the S&P 100 index (OEX), a stock market index for US stocks maintained by Standard & Poor’s and includes the 100 largest companies in the S&P 500. For the empirical study, we used the OEX put options in 2012 for three reasons.

1. Assuming the effects of the financial crisis have dissipated, 2012 market data are adequate to determine which model substantially describes the current market situation.

2. According to CBOE reports (CBOE Holdings, 2012a), the OEX option was most actively traded in 2012.

3. The contract volume of American put options are usually larger than that of American call options and this also holds for 2012 OEX options (CBOE Holdings, 2012b).
We used simple moneyness, which is the ratio of spot to strike, to describe the relative position of the present price of an underlying asset to the strike price of an option. The data was preprocessed to eliminate distortion, removing options with less than 7 or more than 90 days to expiration. Substantially short TTM is likely to cause distortion due to low timepremiums and bidask spread, whereas long expiration may yield biases and measurement error.
Table 1 summarizes taverage price and standard deviation of the OEX options for TTM and moneyness. The average option price over the entire period is 9.53. The number of observations shows an increase according to moneyness.
Figure 2 summarizes the data structure. The selected parametric model was calibrated with option prices on Day t. Pseudo inputs were generated from the subsequent model and combined with option prices and the other features on Day t+α to predict option prices on Day t+α+1 , where α=1, ..., 7 in the simulation. We set the time interval, α, between parametric model calibrations because calibration takes more computation than the other procedures, and it was difficult to calibrate the parametric model for every day in practice. The generated pseudo inputs can be used at any time unless there was a severe change in market properties, because they only depend on calibrated parameters to describe the market. Therefore, setting α enables application for online or highfrequency data analysis with precalibrated parametric models in practical situations.
Among several parametric models that describe the movement of financial assets with stochastic processes, we selected the Kou model (Kou, 2002) to generate arbitrage free pseudo inputs. Kou model is a jump diffusion process that allows noncontinuous jumps for returns.
Jump diffusion processes are a major class of sophisticated stochastic processes that are extensively used to describe market behavior. The general return process of a jump diffusion process is
where X_{t} is the return; W_{t} refers to Wiener process; N_{i} is a Poisson counting process; Y_{i} is the size of ith jump; α and σ are parameters for drift and volatility.
In Kou model, the jump size distribution is defined as asymmetric exponential distribution as following:
where $\nu (y)$ is the density function of jump sizes; 1_{A} is an indicator function of A and $p,\hspace{0.17em}\hspace{0.17em}\lambda ,\hspace{0.17em}\hspace{0.17em}{\lambda}_{+},$ and ${\lambda}_{}$ are parameters. For option pricing method for calibration, we selected the implicit finite difference method (Kwon and Lee, 2011), which has secondorder convergence rate.
Finally, we constructed a multilayer NN structure with two hidden layers for the proposed method and the conventional model.
3.1 Predictive Results
We compared predictive performance of the proposed and conventional multilayer NN models, with four metrics.

(1) Mean absolute percentage error (MAPE) provides the relative estimation error,

(2) Mean percentage error (MPE) provides the estimation error direction,

(3) Mean absolute error (MAE) provides the estimation error magnitude,

(4) Root mean squared error (RMSE) provides the estimation mean stardard error,
In each case, N is the total number of predicted option prices, ${\epsilon}_{n}={C}_{n}^{mkt}{C}_{n}^{mode},\hspace{0.17em}{C}_{n}^{model}$ is the prices estimated by the model, and ${C}_{n}^{mkt}$ is the true market price.
Tables 24 show these metrics for the predictive results of the proposed and conventional NN model for the options with TTM < 30, 30 TTM < 60, and TTM ≥ 60, respectively.
The proposed method is superior to the conventional NN model in most cases for all four metrics, and is more robust to overfitting, because the proposed method is also superior for prediction of option prices in sparsely traded regions, i.e., where moneyness is far from 1.
The proposed method was inferior to the conventional NN model for a few MPE cases. This my be du to a few large positive and negative errors cancelling in MPE for the conventional models, because the other error metrics are larger than the proposed method for those cases.
Figure 3 shows monthly predicted errors for 2012 OEX options, normalized by January errors for of the proposed method, and absolute MPE values are shown for visual simplicity. The proposed model errors (blue lines) are always below conventional model errors (red lines) for every month and every measure except RMSE in October.
To analyze arbitragefree characteristics, we compared predicted option prices from the proposed method with arbitragefree prices from the calibrated parametric models and the conventional multilayer NN model. Test points were artificially and uniformly generated from the twodimensional grid of TTM and moneyness. The four error measures used for predictive errors were also employed here.
Tables 5 to 7 show that option prices predicted by the proposed method are significantly closer to arbitragefree prices from the parametric model than those from the conventional model for all four metrics in most cases. Thus, the proposed model finds more accurate prices with a high degree of arbitragefree characteristics compared with the conventional NN model.
4. CONCLUSIONS
This study proposed an arbitragefree American type option pricing method by training a multilayer NN model with not only real traded data from the option market, but also artificial pseudo inputs generated from arbitragefree parametric models. We first selected an appropriate parametric model and calibrated it with the data obtained from real market trades, then generated arbitragefree pseudo at the grid points in twodimensional TTM and moneyness space. The proposed NN model was then trained with both arbitragefree pseudo inputs and real traded option data by minimizing the weighted sum of errors of both types of inputs.
The proposed method has two main advantages. First, the proposed multilayer NN model is robust when data points are concentrated on specific values or regions of the input attributes, which is a common phenomenon in option pricing. Second, the proposed method highlights arbitragefree pricing, because the network is trained with arbitragefree inputs generated from the calibrated parametric model. In contrast, conventional option pricing methods with learning models lack any means of finding arbitragefree prices.
The proposed method outperformed the conventional multilayer NN model with the same structure based on S&P 100 index American options in 2012, because the pseudo inputs located sparse trading region prevent overfitting when training the proposed model. We also compared prices obtained from the proposed and conventional models with arbitragefree prices from the calibrated parametric models, and showed that prices from the proposed method were considerably closer to the parametric model than those from the conventional model.
There are a few potential improvements for the proposed model and algorithm, which will be pursued in future study. First, if η_{1} is significantly larger than η_{2} , then the model fits market data more than generated pseudo inputs, whereas if η_{2} is significantly larger than 1, then the model substantially reflects arbitragefree characteristics. Hence, criteria should be developed to determine optimal weights according to the degree of arbitrage free the user wishes to include. Second, the pseudo inputs employed to train the model follow the selected parametric model, and the selection of this model is currently open. Thus, prediction performance of the trained model could be improved by choosing suitable parametric model(s) that closely match true market behavior. Finally, prediction performance could be enhanced by combining the pseudo input generation procedure with other nonparametric methods, such as support vector machines or Gaussian processes, which have been employed previously to predict financial variables.
PATENTS
The result of this study is registered as a patent in Republic of Korea (Grant No. 102120655).