﻿ :: Industrial Engineering & Management Systems ::

• Editorial Board +
• For Contributors +
• Journal Search +
Journal Search Engine
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.15 No.2 pp.148-155
DOI : https://doi.org/10.7232/iems.2016.15.2.148

# Continuous Conditional Random Field Model for Predicting the Electrical Load of a Combined Cycle Power Plant

Gilseung Ahn, Sun Hur*
Department of Industrial and Management Engineering, Hanyang University, Ansan, Korea
Corresponding Author, hursun@hanyang.ac.kr
February 14, 2016 May 1, 2016 June 5, 2016

## ABSTRACT

Existing power plants may consume significant amounts of fuel and require high operating costs, partly because of poor electrical power output estimates. This paper suggests a continuous conditional random field (C-CRF) model to predict more precisely the full-load electrical power output of a base load operated combined cycle power plant. We introduce three feature functions to model association potential and one feature function to model interaction potential. Together, these functions compose the C-CRF model, and the model is transformed into a multivariate Gaussian distribution with which the operation parameters can be modeled more efficiently. The performance of our model in estimating power output was evaluated by means of a real dataset and our model outperformed existing methods. Moreover, our model can be used to estimate confidence intervals of the predicted output and calculate several probabilities.

## 1INTRODUCTION

As the demand for electric power has grown rapidly during the past several decades, so has the interest in the combined cycle power plant (CCPP). This is because CCPPs are known to be very efficient and require relatively low investment costs. A CCPP is composed of a gas turbine, steam turbine, and heat recovery system generators. The two turbines are combined in one cycle and the heat or gas flow transfers the energy from one of the turbines to the other. In general, a gas turbine exhausts gas that is used to produce heat, which is used to make the steam required by the steam turbine (Niu and Liu, 2008).

Numerous control strategies have been developed to reduce CCPP operational costs, but still a more advanced control strategy is necessary to further reduce the entire operational cost. Tüfekci (2014) suggests that it is essential for a base load power plant to predict electrical power outputs correctly in order to attain a maximum profit. Existing plants, however, consume a significant amount of fuel and have high operating expenses partly because of poor prediction of electrical power output requirements. Particularly, the reliability and sustainability of the gas turbine are highly affected by the prediction of the power generation needs.

Some studies adopting thermodynamic approaches to obtain an accurate prediction for the power generation have been done. In order to forecast the power generation accurately with these approaches, however, many assumptions, such as the existence of some empirical relationships, are necessary since they account for unpre dictability in their solution. Without these assumptions, any analysis of a real application calls for many nonlinear equations, whose solution is either almost impossible or requires too much computational time and effort, and sometimes the result is still unsatisfactory and unreliable (Kesgin and Heperkan, 2005).

Several studies employing machine learning methods that enable electrical power prediction as an alternative analysis to overcome these difficulties have been conducted. In (Kesgin and Heperkan, 2005), an artificial neural network and fuzzy logic are utilized to analyze various thermodynamic systems, including a CCPP. In Fan et al. (2016), the authors point out that electric load forecasting is very important for power utility and they present a support vector regression (SVR) model blended with differential empirical mode decomposition (DEMD) and auto regression to forecast electric load. Yadav and Srinivasan (2011) proposes a method for short-term load forecasting, which is based on a smooth transition autoregressive (STAR) model. In Kaya et al. (2012), the power of combined gas and steam turbines is predicted by means of a k-nearest neighbor smoother, multivariate linear regression, artificial neural network, and some other methods. Recently, Tüfekci (2014) has dealt with several machine learning regression methods for the predictive analysis of a thermodynamic system, which is a combined cycle power plant with one steam turbine and two heat recovery systems. In Clifton et al. (2013), the authors perform a study to predict a wind turbine’s power output by means of a regression tree. In Prokop et al. (2013), an application of evolutionary fuzzy rules is presented to model and predict the power output of a real-word photovoltaic power plant (PVPP). In Yu and Xu (2014), a combinational approach based on improved back propagation (BP) neural network for shortterm gas load forecasting is proposed and the genetic algorithm is employed to optimize the network. In Al- Rashidi and El-Naggar (2010), the authors suggest a novel method for annual peak load forecasting in electrical power system, which employs a particle swarm optimization to find the optimal model parameters of the model. Xie and Hong (2015) presents an integrated probabilistic electric load forecasting solution, which consists of three components: pre-processing, forecasting, and post-processing. In the forecasting component, time series modeling and neural networks are employed to forecast the electric load. With the increase of the researches employing machine learning methods to predict electrical power, several researches compare machine learning methods.

Because the gas turbine power output mostly depends on the ambient parameters such as ambient temperature, atmospheric pressure, and relative humidity, these are the input variables of our model. Moreover, exhaust steam pressure should be included in the input variables since steam turbine power output has a direct relationship with the exhaust vacuum level. Figure 1 shows the configuration of the gas turbine. Since various relationships exist among dataset variables for the prediction of electrical power output, a model that can deal well with complicated structure, including a wide variety of arbitrary and non-independent input features is desirable to obtain an accurate prediction. To the best of our knowledge, however, there is no previous research result considering various relationships among variables in the described data set. In order to consider the various relationships among the variables, a continuous conditional random field (C-CRF) is employed in this paper to predict the full load electrical power output of a base load operated combined cycle power plant.

The C-CRF model is appropriate for this regression problem for two reasons. First, the C-CRF model can accommodate many input variables and represent complex dependence relations among them into the mathematical description. Second, the C-CRF model can provide not only point estimation, but also an interval estimation of the predicted value by considering similarity among the outputs. CRF was originally developed for classification of sequential data (Lafferty et al., 2011), and has been adapted for many applications in various areas, including computer vision (Kumar and Hebert, 2003). Recently, CRF has been extended to regression by allowing the target variable to be continuous (C-CRF) and applied to regression on spatial-temporal data (Liu et al., 2004).

In this study, we construct an association potential, which is part of the C-CRF model, using machine learning methods: artificial neural network (ANN), regression tree (RT), and multiple linear regression (MLR). These methods have shown high predictive validity in electrical power output prediction problems. Euclidean similarity is introduced for interaction potential, which is another part of C-CRF. The model is trained with the help of a multivariate Gaussian distribution, and then is evaluated with a real data set. The prediction accuracies of the suggested model are compared with those in (Tüfekci, 2014). Additionally, some application examples are provided in order to show the effectiveness of the model.

The remainder of this paper is organized as follows. We introduce the C-CRF in Section 2 and show how to apply it to predict the power of a combined cycle power plant in Section 3. In Section 4, we evaluate the suggested model with a real data set, and compare the model with those in Tüfekci (2014) based on the prediction results. Finally, Section 5 concludes the paper.

## 2CONTINUOUS CONDITIONAL RANDOM FIELD

C-CRF is one of the probabilistic graphical models used to express the complex dependence structure among system output variables by conditional probability distributions. A key advantage of the C-CRF is the flexibility to include a wide variety of arbitrary and non-independent features as inputs (McCallum, 2002). In addition, the C-CRF has a power of explanation because it is a probabilistic graphical model, as depicted in Figure 2, where each node denotes a variable and each edge denotes a relationship between the two nodes. Here, x(i) and yi are values of the ith input and output variables, respectively. We adopt the C-CRF structure depicted in Figure 2 to construct the proposed model.

A C-CRF can be represented as a conditional probability distribution P($y | x$ ) as follows:

$p ( y | x ) = 1 z ( x , α , β ) exp ( ∑ ​ n i = 1 A ( α , y i , x ( i ) ) + ∑ ​ i , j I ( β , y i , x ) )$
(1)

where n is the number of observations, α and β are vectors of parameters, x is the vector of input variables, and Z(x,α , β ) is a normalization factor that ensures $p ( y | x )$ a proper probability distribution:

$Z ( x , α , β ) = ∫ ​ exp ( ∑ i = 1 n A ( α , y i , x ( i ) ) + ∑ i , j I ( β , y i x ) ) d y$
(2)

In (1), $A ( α , y i , x ( i ) )$ and $I ( β , y i , y j , x )$ are the association potential between the ith input variable vector x(i) and output variable yi (represented as thick lines in Figure 2), and the interaction potential between outputs yi and yi (drawn in fine lines in Figure 2), respectively. The association potential indicates the relationship between inputs and outputs, while the interaction potential indicates the relationship among outputs. In the C-CRF application, the association and interaction potentials are often defined as linear combinations of fixed feature functions in terms of α and β as follows (McCallum, 2002):

$A ( α , y i , x ( i ) ) = ∑ k = 1 K 1 α k f k ( y i , x ( i ) ) ,$
(3)

$I ( β , y i , y j , x ) = ∑ k = 1 K 2 β k g k ( y i , y j , x ) ,$
(4)

Computing the normalization function in (2) is essential to obtain the exact probability noted in (1), but doing so is very complicated and sometimes intractable. In Radosavljevic et al. (2010), it is shown that if $A ( α , y i , x ( i ) )$ and $I ( β , y i , y j , x )$ are defined as quadratic functions in terms of y, then the sum $A ( α , y i , x ( i ) ) + I ( β , y i , y j , x )$ can be transformed into the form of $( y − μ ) T Σ − 1 ( y − μ ) + c o n s t a n t$. This expression corresponds to a multivariate Gaussian distribution with a mean vector μ and a covariance matrix Σ. If the C-CRF is converted into the form of a multivariate Gaussian distribution, then the learning task of the parameters becomes relatively easier. A more detailed explanation can be found in Section 3.

Learning a C-CR Frequires defining values of the parameters α and β , there by maximizing the conditional log-likelihood $L ( α , β ) = ∑ i = 1 n logP ( y l | x ( i ) )$, and then the result would be obtained as follows:

$( α ⌢ , β ⌢ ) = arg max α , β ( L ( α , β ) )$
(5)

Learning can be done by applying a standard optimization algorithm, such as the gradient ascent method. The inference task requires finding the output values for a given set of input values and estimated parameters, such that the conditional probability $Pr ( y | x )$ is maximized:

$y ⌢ = arg max y p ( y | x ) .$
(6)

## 3THE C-CRF MODEL FOR POWER GENERATION PREDICTION

In this section, we describe in detail the proposed C-CRF model for regression analysis in power generation. We introduce the same input and target variables as used in Tüfekci (2014). More specifically, ambient temperature (AT, measured in degrees in Celsius), atmospheric pressure (AP, in units of millibars), relative humidity (RH, measured as percentages), and exhausts steam pressure (or vacuum, V, in cmHg) are introduced as input variables in the dataset. As for the target variable, full load electrical power (PE, in megawatt) is used. All measurements are obtained through the sensors and are averaged hourly to provide the values of the input and target variables.

ANN, RT, and MLR were applied to predict the full load electrical power output in previous study (Al- Rashidi and El-Naggar, 2010). Employing the input variables as attributes, therefore, we introduce three feature functions to model the association potential that describes the dependency between the input and target variables for a given observation x(i) as follows:

$f 1 ( y i x ( i ) ) = − ( y i − ANN ( x ( i ) ) ) 2 , f 2 ( y i x ( i ) ) = − ( y i − RT ( x ( i ) ) ) 2 , f 3 ( y i x ( i ) ) = − ( y i − MLR ( x ( i ) ) ) 2 ,$
(7)

where ANN(x(i) ), RT(x(i) ), and MLR(x(i) ) are the outputs of ANN, RT, and MLR, respectively. These feature functions agree with the basic principle for association potentials, that is, their values must increase for more accurate predictions. As a result, the following linear combination of these features provides insight on how much one can trust the prediction methods ANN, RT, and MLR based on the learned parameter vector α = (α1, α2,α3)

$A ( α , y i , x ( i ) ) = − α 1 ( y i − ANN ( x ( i ) ) ) 2 − α 2 ( y i − RT ( x ( i ) ) ) 2 − α 3 ( y i − MLR ( x ( i ) ) ) 2 .$
(8)

For example, a large α2 places big penalty for mistakes in the RT model and therefore, each αi acts as a quality indicator of the corresponding prediction method.

To model the interaction potential we introduce a feature function:

$g ( y i , y j , x ) = − S i , j ( y i − y j ) 2 ,$
(9)

where Si, j denotes a similarity between data i and j, and the corresponding interaction potential is given by:

$I ( β , y i , y j , x ) = − β S i , j ( y i − y j ) 2$
(10)

The learned parameter β represents the level of correlation within neighboring outputs. That is, a large value of β implies a high correlation between yi and yi.

Finally, the resulting C-CRF model is given by:

$P ( y | x ) = 1 z ( x ) exp { ∑ i ∑ k = 1 3 − α k ( y i − f k ( y i , x ( i ) ) ) 2 + ∑ i , j − β S i , j − β S i , j ( y i − y j ) 2 } .$
(11)

We further map the distribution shown in Eq. (11) to a multivariate Gaussian distribution to reduce computational complexity. As one can see in Eq. (2), it is necessary to solve complicated integral calculus in order to obtain the C-CRF parameters. In Eq. (11), the potentials can be represented as quadratic forms as follows:

$P ( y | x ) = 1 ( 2 π ) n / 2 | Σ | 1 / 2 × exp ( − 1 2 ( y − μ ( x ) ) T Σ − 1 ( y − μ ( x ) ) ) .$
(12)

In the Gaussian mapping, the inverse of the covariance matrix Σ−1 is the sum of two n × n matrices, namely, $Σ − 1 = 2 ( Q 1 + Q 2 )$, where

$Q i , j 1 = { ∑ k = 1 3 α k , if i = j , 0 , otherwise, a n d Q i , j 2 = { ∑ j β S i , j , if i = j , − β S i , j , otherwise.$
(13)

Further, the mean vector μ (x) is computed as Σθ, where $θ = 2 ∑ k = 1 3 α k f k ( x )$.

With the multivariate Gaussian distribution that aims at maximizing the log-likelihood, the learning of the CCRF in Eq. (5) becomes a convex optimization problem. As mentioned above, the gradient ascent method can be applied to learn the parameters. Specifically, we maximize the log-likelihood with respect to log αk and log β instead of αk and β , which results in the new optimization problem becoming unconstrained. Derivatives of the log-likelihood function and updates of α and β in gradient ascent can be computed as follows:

$∂ L ∂ log α k = α k ∂ L ∂ α k , ∂ L ∂ log β = β ∂ L ∂ β ,$
(14)

$log α k n e w = log α k o l d + η ∂ L ∂ log α k , log β k n e w = log β o l d + η ∂ L ∂ log β .$
(15)

Now, the prediction shall be the expected value of Gaussian model, which is equal to the mean of the distribution:

$y ^ = arg max y P ( y | x ) = Σ θ$
(16)

## 4EXPERIMENT

In this section, an experiment to illustrate our model is provided. We describe the data used in the experiment, and then utilize the model to predict the probability distribution of the target value when the four input variables are given. Finally, the results are compared to existing methods to validate our model.

### 4.1Data

We obtained a combined cycle power plant dataset from the UCI Machine Learning Repository (http://archi ve.ics.uci.edu/ml/). It is composed of 9,568 records collected while the combined cycle power plant was set to work with a full load over 674 days, contained four input variables (ambient temperature (AT), ambient pressure (AP), relative humidity (RH), and vacuum (V)), and a target variable (full load electrical power output (PE)). Table 1 shows the basic statistics of the dataset.

Table 2 contains the correlation matrix, indicating that the variables do not seem to be independent of each other.

### 4.2Modeling and Evaluation

In order to validate and compare the performance of our model to those described in (Tüfekci, 2014), 5× 2 cross-validation is applied. Root mean squared error (RMSE) and mean absolute error (MAE) are calculated and listed in Table 3 to assess the prediction accuracy, where:

$RMSE= ∑ i = 1 n ( y i − y ^ i ) 2 n$
(17)

$MAE= 1 n ∑ i = 1 n | y i − y ^ i | .$
(18)

Euclidean similarity Si, j is employed to model the interaction potential, where Si, j between two records and j is calculated as follows:

(19)

Note that when we calculate the Euclidean distance, in order to remove the scale effect of the variables, anormalized Euclidean distance, $∑ k = 1 4 [ 1 std ( x k ) ( x i k − x j k ) ] 2$, is used, where std(xk ) denotes the standard deviation of xk , and xik denotes the value of variable k of the record i.

With the association and interaction potentials, we constructed a C-CRF model and transform it into a multivariate Gaussian distribution as in Eqs. (11) and (12). Parameters are estimated by applying Eqs. (14) and (15) with arbitrarily chosen initial values. The resulting estimated values of parameters are shown in Table 4.

We finally obtain the complete model in a multivariate Gaussian distribution form as follows:

$P ( y | x ) = 1 5513.672453 exp ( − 1 2 ( y − μ ( x ) ) ) T × Σ − 1 ( y − μ ( x ) ) .$
(20)

The RMSE and MAE of the predicted values obtained by Eq. (16) in our model are compared to those provided in the last two columns of Table 10in (Tüfekci, 2014). The result is presented in Table 5:

Table 5 indicates that the C-CRF model presented in the paper shows the lowest RMSE and MAE (written in boldfaced numbers) among all models.

### 4.3Application

The C-CRF model presented here is not only utilized to predict the value of power generation by CCPP, but also used in various other ways. For example, it can be used to derive an interval estimation of the CCPP’s power generation and to calculate the probability distribution of it for a given set of input variables. We have selected three example problems to illustrate its applicability in this section.

As for the interval estimation and probability calculation, we randomly selected a record, i.e., record number 6663 in the dataset, whose feature values are: AT6663 = 26.63°C, AP6663 = 1,012.66mb, V6663 = 64.44cmHg, RH6663 = 61.19%, and PE6663 = 442MW. We also obtained μ (x6663) = 459.45, and $σ 6663 = ( Σ − 1 ) 6663 , 6663 = 21.46$. The resul- tant probability density function of PE (in MW) is as follows:

$P ( P E 6663 | x 6663 ) = 1 53.79 exp ( − ( x − 459.45 ) 2 921.0632 ) .$
(21)

Since the mean and standard deviation are 459.45 and 21.46, respectively, the 95% confidence interval for the target value PE can be easily calculated as 459.45 ± 1.96×21.46 = (417.39, 501.51), which contains the real output value PE6663 = 442MW. A graph of the probability density function (blue line), corresponding to the confidence interval (hatched area) and the real output value (red line) are depicted in Figure 3.

The next example demonstrates the probability that the CCPP’s power generation is greater than 470MW when AT, AP, V, and RH are given as in the record 6663. It is formally described as:

$Pr ( P E 6663 > 470 M W | A T 6663 = 26.63 ° C , A P 6663 = 1 , 012.66 cmHg, V 6663 = 64.44 cmHg, R H 6663 = 61.19 % ) .$
(22)

Since PE6663 follows a normal distribution with a mean of 459.45 and a standard deviation of 21.46, it is easily transformed into standard normal distribution form and calculated as:

$Pr ( Z > 470 − 459.45 21.46 = 0.4916 ) = 0.6885 ,$
(23)

where Z denotes the random variable which follows a standard normal distribution.

Lastly, we randomly selected two records, 143 and 867, from the dataset. Using our model, we calculated the probability that the PE of record 867 is greater than that of record 143, that is, Pr(PE143 < PE867 ), while utilizing the respective input variables for each record. The input variables of record 143are: AT143 = 23.02°C, AP143 = 1,011.74mb, V143 = 59.21cmHg, and RH143 = 83.18%. Input variables of record 843 are: AT867 = 29.05°C, AP867 = 1,011.33mb, V867 = 70.32cmHg, and RH867 = 72.50%. The probability density functions of PE143 and PE867 are presented in (24) and (25):

$P ( P E 143 | x 143 ) = 1 49.43 exp ( − ( x − 432.66 ) 2 777.7568 ) ,$
(24)

$P ( P E 867 | x 867 ) = 1 53.67 exp ( − ( x − 456.71 ) 2 916.7762 ) .$
(25)

A simulation approach is subsequently adopted to solve the problem. 10,000 random numbers having the probability density functions presented in Eqs. (24) and (25) are generated, respectively, and two sets of random numbers are compared. More precisely, $D 1 = { D 1 i : i = 1 , 2 , ⋯ , 10 , 000 }$ and $D 2 = { D 2 j : j = 1 , 2 , ⋯ , 10 , 000 }$ are the sets of random numbers having the probability density functions presented in Eqs. (25) and (26), respectively. As a result, the probability Pr(PE143 < PE867 ) is calculated as follows:

$Pr ( P E 143 < P E 867 ) = ∑ i = 1 10 , 000 ∑ j = 1 10 , 000 I ( D 1 i > D 2 j ) 10 , 000 × 10 , 000$
(26)

where I (D1i > D2j ) is an indication function that returns 1 if D1i > D2j and 0, otherwise. By means of this approach, Pr(PE143 < PE867 ) is calculated as 0.8551.

## 5CONCLUSION

This study proposed a C-CRF model to predict the power of a CCPP at full load. Machine learning approaches were preferred to help ensure accurate prediction instead of thermodynamic approaches, which involve some assumptions with many nonlinear equations and require significant computational time and effort.

In this study, we have introduced three feature functions with machine learning methods to model the association potential, and a feature function with Euclidean similarity to model the interaction potential. With potentials represented by a combination of feature functions, the C-CRF model is transformed into a multivariate Gaussian distribution with which training and inference tasks were performed. The accuracies in terms of RMSE and MAE were compared to show that our model outperformed existing methods. The presented model can be applied to any regression application, such as interval estimation.

## Figure

Configuration of the gas turbine (http://www.understandingchp.com).

C-CRF structure.

Probability density function of the PE of record 6663.

## Table

Basic Statistics of Dataset

Correlations among Input Variables

RMSE and MAE of ANN, RT, and MLR

Estimated parameters values

Comparison results between our C-CRF model and those described in Tüfekci (2014)

## REFERENCES

1. AlRashidi M R , El-Naggar K M (2010) Long term electric load forecasting based on particle swarm optimization , Applied Energy, Vol.87 (1) ; pp.320-326
2. Clifton A , Kilcher L , Lundquist J K , Fleming P (2013) Using machine learning to predict wind turbine power output , Environmental research letters, Vol.8 (2) ; pp.0204009
3. Fan G F , Peng L L , Hong W C , Sun F (2016) Electric load forecasting by the SVR model with differential empirical mode decomposition and auto regression , Neurocomputing, Vol.173 ; pp.958-970
4. Kaya H , Tüfekci P , Gürgen F S (2012) Local and global learning methods for predicting power of a combined gas and steam turbine , in: International conference on emerging trends in computer and electronics engineering,
5. Kesgin U , Heperkan H (2005) Simulation of thermodynamic systems using soft computing techniques , International journal of energy research, Vol.29 (7) ; pp.581-611
6. Kumar S , Hebert M (2003) Discriminative random fields: A discriminative framework for contextual interaction in classification , in: computer Vision, Proceedings Ninth IEEE International Conference on, ; pp.1150-1157
7. Lafferty J , McCallum A , Pereira F C (2011) Conditional random fields: Probabilistic models for segmenting and labelling sequence data ,
8. Liu Y , Carbonell J , Klein-Seetharaman J , Gopalakrishnan V (2004) Comparison of probabilistic combination methods for protein secondary structure prediction , Bioinformatics, Vol.20 (17) ; pp.3099-3107
9. McCallum A (2002) Efficiently inducing features of conditional random fields , in: proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence,
10. Niu L X , Liu X J (2008) Multivariable generalized predictive scheme for gas turbine control in combined cycle power plant , in: Cybernetics and Intelligent Systems, IEEE Conference on, ; pp.791-796
11. Prokop L , Misak S , Snasel V , Platos J , Krömer p (2013) Supervised learning of photovoltaic power plant output prediction models , Neural Network World, Vol.23 (4) ; pp.321-338
12. Radosavljevic V , Vucetic S , Obradovic Z (2010) Continuous Conditional Random Fields for Regression in Remote Sensing , in: ECAI, ; pp.809-814
13. Tüfekci P (2014) Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods , International Journal of Electrical Power and Energy Systems, Vol.60 ; pp.126-140
14. Xie J , Hong T (2015) GEF Com 2014 probabilistic electric load forecasting: An integrated solution with forecast combination and residual, simulation , International Journal of Forecasting,
15. Yadav V , Srinivasan D (2011) A SOM-based hybrid linear-neural model for short-term load forecasting , Neurocomputing, Vol.74 (17) ; pp.2874-2885
16. Yu F , Xu X (2014) A short-term load forecasting model of natural gas based on optimized genetic algorithm and improved BP neural network , Applied Energy, Vol.134 ; pp.102-113