1. INTRODUCTION
In the field of economic sciences, regression analysis is a recognized tool for modeling and forecasting in solving many applied problems. However, when applying the methods of traditional regression analysis, certain difficulties arise in some cases. These difficulties are observed when the data set is very small (the problem of short samples), data registration errors occur; therefore, there is uncertainty in the relationship between independent and dependent variables (Savic and Pedrycz, 0991). As a rule, the method of ordinary least squares (OLS) is used to determine the parameters of a linear regression model (Draper and Smith, 1997). However, if the distribution of random perturbations of the model differs significantly from the normal one or the conditions of their independence and homoscedasticity are violated, then the efficiency of the OLS decreases, and then it is advisable to turn to alternative methods (Kim and Chen, 1997). In such situations, an approach based on fuzzy regression methods looks quite reasonable and leads to economically meaningful results.
Fuzzy linear regression (FLR), in which some elements of the model are represented by fuzzy numbers, is a fuzzy version of the classical regression analysis. It gives a fuzzy relationship between dependent and independent variables, which can be crisp or fuzzy, and allows you to get both point and interval forecasts (Chernov, 2018).
Fuzzy linear regression (FLR) was first introduced by Tanaka (Tanaka et al., 1982). The Tanaka method assumes that the deviations between the observed and calculated values of the dependent variable are due to the fuzzy structure of the model. This structure was represented as a fuzzy linear function whose parameters were set by fuzzy sets. Linear programming was used to develop the fuzzy regression model, while the coefficients of the model were determined as symmetric triangular fuzzy numbers.
The described approach to fuzzy regression modeling, due to its simplicity and immediate interpretability, is most widespread and is beginning to be actively used to solve applied economic problems. Here are some examples. FLR has been used to model regional economic growth (Alsaied, 2019), evaluate real estate (Volkova and Gisin, 2020), evaluate the parameters of the Bottazzi–Peri technological knowledge growth model (Wheatcroft and Walklate, 2014), and forecast inventory of work-inprogress in manufacturing products (Biryukov, 2015).
Currently, in addition to the Tanaka method, there are also a number of other methods for constructing FLR; almost all of them use triangular fuzzy numbers. An extension of symmetric triangular fuzzy coefficients to trapezoidal fuzzy numbers for the FLR model based on the Tanaka method is proposed (Charfeddine et al., 2005). The need to use trapezoidal membership functions of fuzzy numbers is a consequence of the following reasons (Montazeri-Gh and Mahmoodi-k, 2015):
-
• The need to optimize the fuzziness of the model;
-
• It is necessary to restrict the experimental data within the range of estimated values.
In this paper, we propose an approach to the construction of FLR, in which the coefficients of the model are represented as trapezoidal fuzzy numbers. In this case, the input data are crisp numbers, while the predicted values of the dependent variable are described by trapezoidal fuzzy numbers.
To model the gross regional product of the Republic of Tatarstan (RT) of Russia from various factors, the proposed method of FLR using trapezoidal fuzzy numbers and FLR using triangular fuzzy numbers, applied with crisp sample data, was used. A comparative assessment of the quality of fuzzy regression models is carried out.
2. METHODS
The idea of the proposed approach to estimate FLR is based on forming a sample of two samples using the original one. For this purpose, a crisp linear regression is preliminarily evaluated, and the predicted values of the dependent variable are calculated which is based on it. Forming two samples is performed according to the following rules:
-
1. The observations in the first sample are equal to the actual values of the dependent variable if they are smaller than or equal to the predicted values, otherwise they are equal to them.
-
2. The observations in the second sample are equal to the actual values of the dependent variable if they are bigger than or equal to the predicted values, otherwise they are equal to them.
After forming these samples, through them, two FLR with symmetrical triangular fuzzy coefficients are evaluated. Basing on these FLR, the model is formed of coefficients are presented as trapezoidal fuzzy numbers.
The methods for implementing this approach to construct a FLR can differ both in the method of estimating the crisp regression model used to form the two modified samples, and in the method of estimating the fuzzy regression. According to view of a practice, the method of OLS, the method least absolute deviations and robust methods are the greatest interest in evaluating a crisp regression (Tzimopoulos et al., 2016). Among the methods for estimating FLR, should be noted the Tanaka method.
In this work, to implement the proposed approach to constructing of a fuzzy regression model, the formation of modified samples is used by OLS. We will provide a phased description of the developed method for constructing FLR assuming that the specification of the model has already been determined.
Stage 1. Estimating linear multiple regression based on OLS and calculating the predicted values of the dependent variable using a selective model.
If there are p explanatory variables, the selective model of multiple linear regression has the form:
where – estimated values of the dependent variable; - independent variables.
Selective estimates of the model coefficients are determined by the OLS according to the following vectormatrix relation (Draper and Smith, 1997):
where - n- dimensional vector-column of observations of the dependent variable;
- (p+1) - dimensional vectorcolumn of the parameters of the regression equation;
values of independent variables.
The predicted values of the dependent variable are calculated from (1).
Stage 2. Formation of two samples according to the following rules:
where - the actual value of the dependent variable; - the predicted values of the dependent variable by the model.
Stage 3. Construction of two fuzzy linear regressions using triangular fuzzy numbers based on the formed samples. FLR have the following form:
where as the model coefficients, are triangular fuzzy numbers of the form: .
ai – the most probable value of the coefficient, and the value bi – describes the width of its fuzzinesses.
The membership function of a fuzzy triangular number is shown in Figure 1.
Formally, determining the coefficients of FLR is re-duced to a linear programming problem Wheatcroft and Walklate, 2014):
Conditions (3) are conditions for including the val-ues of the dependent variable Y in the range of possible values of F. The statement of the problem of estimating FLR is to find such values of the parameters of the fuzzy coefficients of the model and , minimizing the width of the fuzzy corridor, which covers the actual values of the dependent variable:
Moreover, the found FLR equation includes three components that describe the fuzzy corridor:
Note that, between the functions Y1 and Y3 all actual values of the dependent variable will be located.
In the proposed method, fuzzy regression is con-structed using triangular fuzzy numbers for each sample. As a result, we get two FLR, each of which includes three components:
The equation of the first sample includes:
Function Y11 - the upper bound of the first corridor, Function Y12 - the middle of the fuzzy first corridor and Function Y13 - the lower bound of the first corridor.
Each fuzzy coefficient has the form:
The equation of the second sample includes:
Function Y21 - the upper bound of the second corri-dor, Function Y22 - the middle of the fuzzy second corri-dor and Function Y23 - the lower bound of the second corridor.
Each fuzzy coefficient has the form:
Note that .
Basing on the constructed two FLR, the values of the dependent variable are determined in the form of trapezoidal fuzzy numbers. They are defined using four functions:
-
• Y21 - upper border of the second corridor (fd);
-
• Y22 - middle of the fuzzy second corridor (fc);
-
• Y12 - middle of the fuzzy first corridor (fb);
-
• Y13 - lower border of the first corridor (fa).
In this case, FLR is determined by coefficients in the form of trapezoidal fuzzy numbers of the form (Figure 2):
3. RESULTS AND DISCUSSION
To test the proposed method, a simulation of the dependence of the GRP of RT was carried out on a number of factors. The following models were used (implementation in the MS Excel table processor environment):
-
• FLR using triangular fuzzy numbers, estimated by the fuzzy corridor method (Biryukov, 2015);
-
• FLR using trapezoidal fuzzy numbers based on the proposed method.
For modeling the GRP, samples of indicators of the socio-economic processes of the RT from 1999 to 2018 were used (data is taken from the website of the Federal State Statistics Service for the Republic of Tatarstan http://tatstat.gks.ru). In this case, the dependent variable represents the gross regional product, mln.rub. (Y). The following indicators are selected as independent variables: volume of shipped products, mln.rub. (X1); agricultural products, mln.rub. (X2); investment in fixed capital, mln.rub. (X3); volume of performed work by type of activity “construction”, mln.rub. (X4).
Selected sectors of the economy of the RT are among the main ones by contribution to GRP.
Regression models were constructed using variables at current prices.
The constructed FLR using triangular fuzzy numbers has the form:
Fuzzy linear regression using trapezoidal fuzzy numbers has the form:
For a comparative evaluation of the quality of FLR, we present a number of their main indicators in Table 1. Indicators for FLR are calculated using the defuzzified values of fuzzy predicted values of the dependent variable by the center of gravity method (Tzimopoulos et al., 2016).
Note that, the value of R2 in both models is close to one, both models have high explanatory properties. The RMSE and MAPE values of fuzzy regression using trapezoid numbers are lower. This indicates to higher predictive properties of fuzzy regression using trapezoid numbers.
4. CONCLUSION
A model FLR using trapezoidal numbers with the formation of samples based on linear regression estimated by the OLS gives the best accuracy in predicting the GRP and better explains its dynamics. Therefore, the obtained equation based on FLR using trapezoidal numbers, which describes the relationship between the studied factors and GRP, allows us to make better assumptions about the interval of its possible changes and has a higher predictive ability. In this paper, we propose a new approach for constructing FLR using trapezoidal numbers. Within the framework of this approach, a method was developed based on the formation of two samples using a linear regression estimated by OLS. Based on the generated samples, two FLR are constructed using triangular fuzzy numbers; by aggregating their coefficients, a model is written based on trapezoidal fuzzy numbers. The ad-vantage of the method is that it is based on well-known methods for estimating linear regressions in crisp and fuzzy statement. As a result of the analysis of the regional economy of the RT for the period 1999-2018, adequate fuzzy models of GRP are obtained. Analysis of the qualitative indicators of the constructed FLR using trapezoidal and triangular fuzzy numbers shows that the model based on the proposed method provides the best accuracy indicators.
Abbreviations: OLS, ordinary least squares; FLR, fuzzy linear regression; GRP, gross regional product; RT, Republic of Tatarstan.