• Editorial Board +
• For Contributors +
• Journal Search +
Journal Search Engine
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.19 No.4 pp.896-900
DOI : https://doi.org/10.7232/iems.2020.19.4.896

# Fuzzy Regression Analysis using Trapezoidal Fuzzy Numbers

Ilyas Idrisovich Ismagilov, Ghena Alsaied*
Department of Economic Theory and Econometrics, Institute of Management, Economics and Finance, Kazan Federal University. Russia
Department of Economic Theory and Econometrics, Institute of Management, Economics and Finance, Kazan Federal University.Russia
October 1, 2020 October 9, 2020 October 19, 2020

## ABSTRACT

As a widely used method, regression analysis plays an increasingly important role in creating statistical models and making forecasts in the field of economics and finance. The use of traditional regression for modeling socio-economic processes is not sufficiently substantiated in some situations. Currently, a new direction is being actively developed, associated with fuzzy regression analysis and its application as an alternative to classical methods for modeling economic phenomena. Fuzzy regression methods are based on the theory of fuzzy sets. A number of methods and their modifications are proposed for constructing fuzzy regression models, but most of them use triangular fuzzy symmetric numbers. In this paper, we propose a new method for constructing linear fuzzy regression using trapezoidal fuzzy numbers. The method is based on dividing the sample using a regression model which is estimated by using the ordinary least squares. Two fuzzy regressions using triangular numbers are estimated from the formed samples, on the basis of which a fuzzy model with trapezoidal fuzzy numbers is constructed. Basing on the proposed method, a linear fuzzy model of the gross regional product as an indicator of the economic development of the Republic of Tatarstan of Russia is constructed depending on a number of factors. A comparative assessment of the quality of fuzzy regression models using triangular and trapezoidal numbers was performed.

## 1. INTRODUCTION

In the field of economic sciences, regression analysis is a recognized tool for modeling and forecasting in solving many applied problems. However, when applying the methods of traditional regression analysis, certain difficulties arise in some cases. These difficulties are observed when the data set is very small (the problem of short samples), data registration errors occur; therefore, there is uncertainty in the relationship between independent and dependent variables (Savic and Pedrycz, 0991). As a rule, the method of ordinary least squares (OLS) is used to determine the parameters of a linear regression model (Draper and Smith, 1997). However, if the distribution of random perturbations of the model differs significantly from the normal one or the conditions of their independence and homoscedasticity are violated, then the efficiency of the OLS decreases, and then it is advisable to turn to alternative methods (Kim and Chen, 1997). In such situations, an approach based on fuzzy regression methods looks quite reasonable and leads to economically meaningful results.

Fuzzy linear regression (FLR), in which some elements of the model are represented by fuzzy numbers, is a fuzzy version of the classical regression analysis. It gives a fuzzy relationship between dependent and independent variables, which can be crisp or fuzzy, and allows you to get both point and interval forecasts (Chernov, 2018).

Fuzzy linear regression (FLR) was first introduced by Tanaka (Tanaka et al., 1982). The Tanaka method assumes that the deviations between the observed and calculated values of the dependent variable are due to the fuzzy structure of the model. This structure was represented as a fuzzy linear function whose parameters were set by fuzzy sets. Linear programming was used to develop the fuzzy regression model, while the coefficients of the model were determined as symmetric triangular fuzzy numbers.

The described approach to fuzzy regression modeling, due to its simplicity and immediate interpretability, is most widespread and is beginning to be actively used to solve applied economic problems. Here are some examples. FLR has been used to model regional economic growth (Alsaied, 2019), evaluate real estate (Volkova and Gisin, 2020), evaluate the parameters of the Bottazzi–Peri technological knowledge growth model (Wheatcroft and Walklate, 2014), and forecast inventory of work-inprogress in manufacturing products (Biryukov, 2015).

Currently, in addition to the Tanaka method, there are also a number of other methods for constructing FLR; almost all of them use triangular fuzzy numbers. An extension of symmetric triangular fuzzy coefficients to trapezoidal fuzzy numbers for the FLR model based on the Tanaka method is proposed (Charfeddine et al., 2005). The need to use trapezoidal membership functions of fuzzy numbers is a consequence of the following reasons (Montazeri-Gh and Mahmoodi-k, 2015):

• • The need to optimize the fuzziness of the model;

• • It is necessary to restrict the experimental data within the range of estimated values.

In this paper, we propose an approach to the construction of FLR, in which the coefficients of the model are represented as trapezoidal fuzzy numbers. In this case, the input data are crisp numbers, while the predicted values of the dependent variable are described by trapezoidal fuzzy numbers.

To model the gross regional product of the Republic of Tatarstan (RT) of Russia from various factors, the proposed method of FLR using trapezoidal fuzzy numbers and FLR using triangular fuzzy numbers, applied with crisp sample data, was used. A comparative assessment of the quality of fuzzy regression models is carried out.

## 2. METHODS

The idea of the proposed approach to estimate FLR is based on forming a sample of two samples using the original one. For this purpose, a crisp linear regression is preliminarily evaluated, and the predicted values of the dependent variable are calculated which is based on it. Forming two samples is performed according to the following rules:

• 1. The observations in the first sample are equal to the actual values of the dependent variable if they are smaller than or equal to the predicted values, otherwise they are equal to them.

• 2. The observations in the second sample are equal to the actual values of the dependent variable if they are bigger than or equal to the predicted values, otherwise they are equal to them.

After forming these samples, through them, two FLR with symmetrical triangular fuzzy coefficients are evaluated. Basing on these FLR, the model is formed of coefficients are presented as trapezoidal fuzzy numbers.

The methods for implementing this approach to construct a FLR can differ both in the method of estimating the crisp regression model used to form the two modified samples, and in the method of estimating the fuzzy regression. According to view of a practice, the method of OLS, the method least absolute deviations and robust methods are the greatest interest in evaluating a crisp regression (Tzimopoulos et al., 2016). Among the methods for estimating FLR, should be noted the Tanaka method.

In this work, to implement the proposed approach to constructing of a fuzzy regression model, the formation of modified samples is used by OLS. We will provide a phased description of the developed method for constructing FLR assuming that the specification of the model has already been determined.

Stage 1. Estimating linear multiple regression based on OLS and calculating the predicted values of the dependent variable using a selective model.

If there are p explanatory variables, the selective model of multiple linear regression has the form:

$Y ^ = b 0 + b 1 X 1 + … + b p X p$
(1)

where $Y ^$ – estimated values of the dependent variable; - independent variables.

Selective estimates of the model coefficients are determined by the OLS according to the following vectormatrix relation (Draper and Smith, 1997):

$B = ( X T X ) − 1 X T Y ,$

where - n- dimensional vector-column of observations of the dependent variable;

- (p+1) - dimensional vectorcolumn of the parameters of the regression equation;

values of independent variables.

The predicted values of the dependent variable are calculated from (1).

Stage 2. Formation of two samples according to the following rules:

where - the actual value of the dependent variable; $y ^ t , t = 1 , n ¯$ - the predicted values of the dependent variable by the model.

Stage 3. Construction of two fuzzy linear regressions using triangular fuzzy numbers based on the formed samples. FLR have the following form:

$Y ˜ = A 0 + A 1 x 1 + … + A p x p$
(2)

where as the model coefficients, $A 1 , … , A p$ are triangular fuzzy numbers of the form: $A i = ( a i − b i , a i , a i + b i )$.

ai – the most probable value of the coefficient, and the value bi – describes the width of its fuzzinesses.

The membership function of a fuzzy triangular number is shown in Figure 1.

Formally, determining the coefficients of FLR is re-duced to a linear programming problem Wheatcroft and Walklate, 2014):

$∑ j = 1 n ∑ i = 0 p ( ( a i − b i ) x i j ) + ( a 0 − b 0 ) ≤ y j , j = 1 , n ¯ ; ∑ j = 1 n ∑ i = 0 p ( ( a i + b i ) x i j ) + ( a 0 + b 0 ) ≥ y j , j = 1 , n ¯ ; b i ≥ 0 , i = 0 , p ¯$
(3)

Conditions (3) are conditions for including the val-ues of the dependent variable Y in the range of possible values of F. The statement of the problem of estimating FLR is to find such values of the parameters of the fuzzy coefficients of the model $α i$ and $b i , i = 0 , p ¯$, minimizing the width of the fuzzy corridor, which covers the actual values of the dependent variable:

$F = ∑ j = 1 n [ ∑ i = 0 p ( ( a i + b i ) x i j ) + ( a 0 + b 0 ) − ∑ i = 0 p ( ( a i − b i ) x i j ) ] → min$
(4)

Moreover, the found FLR equation includes three components that describe the fuzzy corridor:

• • Function Y1 (lower bound);

• • Function Y3 (upper bound);

• • Function Y2 (middle of the corridor).

Note that, between the functions Y1 and Y3 all actual values of the dependent variable will be located.

In the proposed method, fuzzy regression is con-structed using triangular fuzzy numbers for each sample. As a result, we get two FLR, each of which includes three components:

The equation of the first sample includes:

Function Y11 - the upper bound of the first corridor, Function Y12 - the middle of the fuzzy first corridor and Function Y13 - the lower bound of the first corridor.

Each fuzzy coefficient has the form:

$A 1 i = ( a 1 i − b 1 i , a 1 i , a 1 i + b 1 i )$

The equation of the second sample includes:

Function Y21 - the upper bound of the second corri-dor, Function Y22 - the middle of the fuzzy second corri-dor and Function Y23 - the lower bound of the second corridor.

Each fuzzy coefficient has the form:

$A 2 i = ( a 2 i − b 2 i , a 2 i , a 2 i + b 2 i ) .$

Note that $a 2 i ≥ a 1 i$.

Basing on the constructed two FLR, the values of the dependent variable are determined in the form of trapezoidal fuzzy numbers. They are defined using four functions:

• Y21 - upper border of the second corridor (fd);

• Y22 - middle of the fuzzy second corridor (fc);

• Y12 - middle of the fuzzy first corridor (fb);

• Y13 - lower border of the first corridor (fa).

In this case, FLR is determined by coefficients in the form of trapezoidal fuzzy numbers of the form (Figure 2):

(5)

## 3. RESULTS AND DISCUSSION

To test the proposed method, a simulation of the dependence of the GRP of RT was carried out on a number of factors. The following models were used (implementation in the MS Excel table processor environment):

• • FLR using triangular fuzzy numbers, estimated by the fuzzy corridor method (Biryukov, 2015);

• • FLR using trapezoidal fuzzy numbers based on the proposed method.

For modeling the GRP, samples of indicators of the socio-economic processes of the RT from 1999 to 2018 were used (data is taken from the website of the Federal State Statistics Service for the Republic of Tatarstan http://tatstat.gks.ru). In this case, the dependent variable represents the gross regional product, mln.rub. (Y). The following indicators are selected as independent variables: volume of shipped products, mln.rub. (X1); agricultural products, mln.rub. (X2); investment in fixed capital, mln.rub. (X3); volume of performed work by type of activity “construction”, mln.rub. (X4).

Selected sectors of the economy of the RT are among the main ones by contribution to GRP.

Regression models were constructed using variables at current prices.

The constructed FLR using triangular fuzzy numbers has the form:

$Y ^ = ( 32294 , 78 ; 46887 , 06 ; 61479 , 36 ) + 0 , 617 X 1 + ( − 0 , 055 ; 0 ; 0 , 055 ) X 2 + 0 , 739 X 3 + 0 , 551 X 4$
(6)

Fuzzy linear regression using trapezoidal fuzzy numbers has the form:

(7)

For a comparative evaluation of the quality of FLR, we present a number of their main indicators in Table 1. Indicators for FLR are calculated using the defuzzified values of fuzzy predicted values of the dependent variable by the center of gravity method (Tzimopoulos et al., 2016).

Note that, the value of R2 in both models is close to one, both models have high explanatory properties. The RMSE and MAPE values of fuzzy regression using trapezoid numbers are lower. This indicates to higher predictive properties of fuzzy regression using trapezoid numbers.

## 4. CONCLUSION

A model FLR using trapezoidal numbers with the formation of samples based on linear regression estimated by the OLS gives the best accuracy in predicting the GRP and better explains its dynamics. Therefore, the obtained equation based on FLR using trapezoidal numbers, which describes the relationship between the studied factors and GRP, allows us to make better assumptions about the interval of its possible changes and has a higher predictive ability. In this paper, we propose a new approach for constructing FLR using trapezoidal numbers. Within the framework of this approach, a method was developed based on the formation of two samples using a linear regression estimated by OLS. Based on the generated samples, two FLR are constructed using triangular fuzzy numbers; by aggregating their coefficients, a model is written based on trapezoidal fuzzy numbers. The ad-vantage of the method is that it is based on well-known methods for estimating linear regressions in crisp and fuzzy statement. As a result of the analysis of the regional economy of the RT for the period 1999-2018, adequate fuzzy models of GRP are obtained. Analysis of the qualitative indicators of the constructed FLR using trapezoidal and triangular fuzzy numbers shows that the model based on the proposed method provides the best accuracy indicators.

Abbreviations: OLS, ordinary least squares; FLR, fuzzy linear regression; GRP, gross regional product; RT, Republic of Tatarstan.

## ACKNOWLEDGEMENTS

The work is performed according to the Russian Government Program of Competitive Growth of Kazan Federal University.

## Figure

Membership function of fuzzy number Ai.

Construction of trapezoidal numbers.

## Table

Quality Indicators of models (4), (5)

## REFERENCES

1. Alsaied, G. (2019), Modeling the gross regional product based on crisp and fuzzy regressions, Proceedings of the International Conference on Economy in a Changing World, Kazan, Russia, 235-238.
2. Biryukov, A. N. (2015), Multivariate fuzzy regression predictive model for an applied problem, Management of Economic Systems: Electronic Scientific Journal, 2(22), Available from: http://uecs.ru/uecs-22-222010/item/156-2011-03.
3. Charfeddine, S. , Zbidi, K. , and Mora-Camino, F. (2005), Fuzzy-regression analysis using trapezoidal fuzzy numbers, Proceedings of the Joint 4th Conference of the European Society for Fuzzy Logic and Technology and the11th Rencontres Francophones sur la Logique Floue Applications, Barcelona, Spain, EUSFLAT Conf., 1213-1218.
4. Chernov, V. G. (2018), Fuzzy Sets. Fundamentals of Theory and Application: Textbook, Publishing House of the Vladimir State University.
5. Draper, N. and Smith, G. (1997), Applied Regression Analysis (3rd ed), Wiley, New York, 1998.
6. Kim, K. J. and Chen, H. A. (1997), A comparison of fuzzy and nonparametric linear regression, Computers Ops Res, 24(6), 505-519.
7. Montazeri-Gh, M. and Mahmoodi-k, M. (2015), Development a new power management strategy for power split hybrid electric vehicles, Transportation Research Part D: Transport and Environment, 37, 79-96.
8. Savic, D. A. and Pedrycz, W. (1991), Evaluation of fuzzy linear regression models, Fuzzy Sets and Systems, 39(1), 51-63.
9. Tanaka, H. , Uejima, S. , and Asai, K. (1982), Linear regression analysis with fuzzy model, IEEE Transactions on Systems Man and Cybernetics, 12(6), 903-907.
10. Tzimopoulos, C. , Papadopoulos, K. , and Papadopoulos, B. (2016), Fuzzy regression with applications in hydrology, International Journal of Engineering and Innovative Technology (IJEIT), 5(8), 69-75.
11. Volkova, E. S. and Gisin, V. B. (2020), The use of fuzzy linear regression in the model of technological knowledge growth, Finance: Theory and Practice, 5, 97-104.
12. Wheatcroft, J. M. and Walklate, S. (2014), Thinking differently about ‘False Allegations’ in cases of rape: The search for truth, International Journal of Criminology and Sociology, 3, 239-248.
 Do not open for a day Close