Journal Search Engine
Search Advanced Search Adode Reader(link)
Download PDF Export Citaion korean bibliography PMC previewer
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.19 No.3 pp.669-679

Model for Relational Analysis of Posted Articles and Reactions on Restaurant Guide Sites

Teppei Sakamoto, Haruka Yamashita*, Masayuki Goto, Jiro Iwanaga
Graduate School of Creative Science and Engineering, Waseda University, Tokyo, Japan
Faculty of Science and Technology, Department of Information and Communication Sciences, Sophia University, Tokyo, Japan
School of Creative Science and Engineering, Waseda University, Tokyo, Japan
Retty Inc., Tokyo, Japan
*Corresponding Author, E-mail:
June 18, 2018 November 29, 2019 June 12, 2020


Recently, restaurant guide sites providing restaurant information posted by users on the Internet have been widely used as effective tools for consumers. Users, on a restaurant guide site, utilize IDs to post their recommendation articles on restaurants, and these posted articles are a valuable information source for other users. Open users can search for restaurants and read recommendation articles posted by other users. Furthermore, they can react (e.g., “like”) to a recommendation article when they feel it is helpful or they feel like visiting the restaurant. On a target restaurant guide site, each post includes the user ID, restaurant name, recommendation sentences, etc., and the number of reactions is considered to depend on these posted contents. For users who post recommendation articles, the number of reactions to their posts represents the degree of empathy from other users and is an important motivation for posting. Therefore, posting users will benefit from guidelines on how to write good recommendation sentences to increase the number of reactions. Moreover, the number of reactions can be regarded as an important indicator of the activity level of the restaurant guide site from the viewpoint of the service operating company. Therefore, an analytical model developed using historical information such as posts and reactions by users would be useful for determining the relationship between posted contents and the number of reactions. Therefore, this paper proposes a model based on the machine learning approach to analyze the relation between the number of reactions and posted contents. Finally, we demonstrate the analysis based on the proposed model using practical data.



    Recently, various service websites and applications providing restaurant information on the Internet have been widely deployed and used by many users. Such sites on the Internet are called restaurant guides. On a restaurant guide, users can search for restaurants, obtain information about various restaurants, and read recommendation articles posted by other users. Users under IDs post their recommendation articles on restaurants, and these posted articles are a valuable information source for other users. When an open user searches for restaurants on an online restaurant guide, recommendation articles from other users are useful for choosing the restaurant that he/she wants to use. Therefore, recommendation articles are important assets for restaurant guides, and if users post many articles on a restaurant guide and the number of good articles increases, the site user activity increases. In addition, users who post a recommendation article consider positive reactions of other users to the posted article to be a motivation for the next post. In other words, determining what makes an article good is helpful not only for users who post articles, but also for a service management company.

    Here, we conduct a case study by focusing on the Japanese restaurant guide, Retty1, which is a famous online service site in Japan. On the target restaurant guide Retty, open users can view many recommendation articles posted by users under their real names and react (e.g., “like” or “I want to go”) to each posted article. As a user posts recommendation articles on Retty using his/her real name, the reliability of the information is relatively high, and recommendation articles are highly valued by general users. In addition, users can choose to “follow” their favorite users like other social networking services (SNS) and also react to each recommendation article, such as “like” and “I want to go.” For users who post recommendation articles, the number of reactions for their posts is an important motivation because it represents the degree of empathy from other users. Therefore, posting users will benefit from guidelines on how to write effective recommendation sentences to increase the number of reactions. For this purpose, the characteristics of good posted articles can be revealed by building a model representing the relationship between the contents of a recommendation article and the number of reactions given by other general users. A recommendation article contains a lot of information such as restaurant information (e.g., category and budget), recommendation sentences, images, and recommendation degree. In particular, recommendation sentences have characteristics that allow users to describe their opinions freely from various viewpoints such as taste, customer service, and the atmosphere inside the restaurant. There is also an aspect that it is easy to increase the number of posts with reference to the analysis result that will be described in Section 4. In this study, we focus on the characteristics of recommendation sentences to determine how to write an effective article, to increase the number of reactions.

    It is known that the number of followers of a posting user has a larger influence on the number of reactions than other factors, such as the contents of a recommendation article. Therefore, we propose a method of hierarchical modeling of the relationship between the contents of a recommendation article and the number of reactions given by other general users. In the first step, we build a regression model that predicts the number of reactions by using information such as the number of followers as explanatory variables. Then, we obtain residuals (the difference between the predicted value and the measured value) for each data. In the proposed model, we assume that the residuals deviate from the baseline (i.e., expected number of reactions that the article will get) owing to the effect of text information. In the second step, we construct a latent class model that represents the relationship between the residuals, recommendation sentences, and restaurant. The latent class model (Hagenaars and McCutcheon, 2002) provides a method that enables the analysis of objects containing heterogeneous data. The latent class model is shown to be useful for analysis of free-format text and purchasing history data in many previous studies and is useful for the model discussed in this paper. Thus, the purpose of this study is to develop a model for relational analysis of recommendation articles and reactions by assuming a hierarchical structure and using the latent class model. Finally, we demonstrate an analysis based on the proposed model by using practical data stored on the target restaurant guide Retty. Through the demonstration of the analysis, the effectiveness of the proposed model is confirmed.


    2.1 Restaurant Guides

    Restaurant guides are integrated information service available on the Internet where users can search for restaurant information and exchange information about various restaurants. In recent years, the number of consumers who use restaurant guides when choosing restaurants has rapidly increased. In Japan, restaurant guides emerged at the end of the 20th century. Originally, guide-type restaurant information sites introduced various restaurants as the mainstream; however, owing to the spread of the Internet and the development of information technology, this mainstream transformed into a posting systems where users themselves could post recommendation articles about restaurants. As users can freely describe their impressions about restaurants from various viewpoints, the amount of information is larger than that on guide-type restaurant information sites. A huge amount of posted information helps many other users with restaurant selection.

    Next, we explain the restaurant guide Retty, which is the target site in this study. Retty is one of the most famous online restaurant guides in Japan and provides users the function of posting recommendation articles in addition to restaurant search. A recommendation article contains a recommending degree (three levels of excellent/ good/average), recommending sentences, images, time zone (morning/lunch/dinner), etc. In addition, it has the SNS function. First, a user registers a system account using his/her real name linked with Facebook. In addition, to find a restaurant from helpful users, users can “follow” their favorite users and react (e.g. “like” or “I want to go”) to recommendation articles by other users.

    2.2 Related Works

    Several studies have been conducted on the analysis of data stored on restaurant guides. For instance, Pantelidis (2010) analyzed factors increasing the degree of restaurant recommendations in restaurant guides and indicated the restaurant elements to be improved. Kang et al. (2012) focused on sentimental words of articles by labeling them negative or positive manually and by classifying the labels using a machine learning method. Mochizuki et al. (2013) proposed a method for translating a recommendation article using paraphrase based on the experience of the user who posted the article into an expression that is easy to be transmitted to a receiving user. As another perspective, Zhang et al. (2013) focused on the reservation behavior on a restaurant guide and analyzed factors leading to reservation. As mentioned above, several studies have been conducted from various viewpoints on data stored on restaurant guides. However, studies focusing on recommendation articles themselves and their relationship with reactions from other users have not been conducted.

    2.2 Latent Class Model

    Latent class models (Bishop, 2006;Hagenaars and McCutcheon, 2002;Magidson and Vermunt, 2002) assume the existence of an unobservable discrete latent variable behind observed variables. The assumption of latent variables makes it possible to analyze realistic complex problems such as mixed heterogeneous data. In other words, latent class models assume that the whole data are an aggregate in which groups with different characteristics are mixed. Latent class models can be extended by incorporating a hypothesized probability distribution and features of the considered event into the models according to the target event and data structure. They provide a method that enables the analysis of objects containing heterogeneous data. These models have been shown to be useful for analysis of text data (Blei et al., 2003;Hofmann, 1999;Yamamoto et al., 2017), collaborative filtering (Hofmann, 2004;Jin et al., 2003, 2006;Suzuki et al., 2014;Si and Jin, 2003), and analysis of marketing data (Goto et al., 2015;Green et al., 1976;Iwata et al., 2009;Swait and Adnmowicz, 2001;Train, 2009) in many previous researches. In these models, data are generated by following a mixture model of several different probability distributions. This model structure fits real data consisting of heterogeneous subgroups excellently.

    Recently, many types of latent class models have been proposed. In this section, we introduce some wellknown latent class models and their applications. The unigram mixture model (Nigam et al., 2000) is a wellknown basic document model that assumes that all terms in a document belong to a latent class; that is, all terms in the same document are generated on the same topic. Probabilistic latent allocation (PLSA) is also a wellknown latent class model that assumes that data occurs probabilistically from latent classes (Hofmann, 1999). In particular, when we handle documents and words, these models are also called topic models, and studies such as Latent Direchlet Allocation (LDA) proposed by Blei et al. (2003) are being conducted. In addition, latent class models are widely applied to various fields such as collaborative filtering (Hofmann, 2004) and purchasing history analysis (Iwata, 2009;Goto, 2015).


    3.1 Overview

    In this study, we propose a latent class model for the relational analysis between recommendation articles and the number of reactions. In particular, we focus on the influence of recommendation sentences on reactions of other users.

    Here, we explain the characteristics of the proposed model. The first characteristic is as follows: when the number of reactions is set as a response variable of the model, it is found that the influence of basic posting information, such as the number of followers of each user and the number of images of each article, is larger than the influence of sentences. Therefore, a risk that regression analysis will not be able to determine the influence of text information if the information that is likely to affect reactions and the text information are handled in parallel and are included in explanatory variables simultaneously exists.

    Hence, we propose a hierarchical model in this study. First, we develop a regression model that predicts the number of reactions by variables of basic information such as the number of followers. Then, we obtain residuals (the difference between the predicted value and the measured value) of the regression model. Here, the residual is regarded as a value excluding the influence of basic information, and it is assumed that the residuals deviate from the baseline owing to the effect of text information. In the second step, we make a cluster determined by the value of residuals, recommendation sentences, and restaurant by using the latent class model. Note that many types of sentences (words) and restaurants are difficult to express using a single relationship. Therefore, to learn while grouping them automatically, we build a model assum-ing a latent class. Figure 1 shows a conceptual image of this approach.

    When we use the whole data for learning a regres-sion function, the estimated model can overfit the data, and the residual can be underestimated. In other words, the estimated residual is not suitable for analysis based on the latent class model; therefore, we divide the data into two sets. We use the one set to learn the regression function and another to learn the latent class model by applying the learned function and to calculate the re-sidual for the data.

    The procedure is as follows. First, we divide the whole data into the data for learning the regression function and the data for learning the latent class model. Next, we estimate the regression function F, apply the function to another data, and calculate each residual. Then, we build a latent class model using the infor-mation on the residual, text, and restaurant. The proce-dure is presented in Figure 2.

    3.2 Regression using Basic Information (Step 1)

    In the first step (Step 1), we make a predictive model that expresses the relationship between the number of reactions and the basic information about the article (e.g., the number of followers). We develop a regression model by using the number of reactions as the responsive variable and the basic information as explanatory variables. For each recommendation posted by a user, the explanatory variable vector with D pieces of the basic information on an article is denoted by x = ( x 1 , x 2 , , x D ) T , and the number of reactions to the article is denoted by y. By using an arbitrary prediction function F, Equations (1) and (2) lead to the prediction value y ^ and the residual r of the article.

    y ^ = F ( x )

    r =   y   y ^

    Here, we interpret y ^ as a baseline for the number of reactions. The residual y ^ , which cannot be explained by the basic information, is assumed to contain the effect of recommendation sentences and recommended restaurants.

    3.3 Modeling of the Residual Value and Article (Step 2)

    In the next step (Step 2), we model the relationship between restaurants, sentences, and residuals obtained using the regression in Step 1. Owing to the diversity of sentences and restaurants, we introduce a latent class model.

    3.3.1 Formulation of the Model

    First, we introduce notations to formulate our model. The vocabulary of words used for the analysis is denoted by V = { w i | 1 i I } and the restaurant set is denoted by S = { s j | 1 j J } . The text information vector of a document is defined as d = ( d w 1 , d w 2 , , d w I ) , where d w i is a binary variable. If the word wi appears in an article, d w i = 1 ; otherwise, d w i = 0 . In addition, an unobserved latent class zk is denoted by z k   Z , where Z = { z 1 , z 2 , , z K } is the set of latent classes.

    Now, we focus on a recommendation article. Here, the co-occurrence of the residual, restaurant of the article, and text information is denoted by P ( r , s j , d ) . Then, the probability model P ( r , s j , d ) is formulated as

    P ( r , s j , d ) = k = 1 K P ( z k ) P ( r | z k ) P ( s j | z k ) P ( d | z k )


    P ( r | z k ) = 1 2 π σ k 2 exp { ( r μ k ) 2 2 σ k 2 }

    P ( d | z k ) = i = 1 I P ( w i | z k ) d w i P ( w ¯ i | z k ) 1 d w i

    Regarding the residual value, a normal distribution is assumed, and μk and σ k 2 represent the average and variance of the normal distribution in the latent class zk, respectively. P ( s j | z k ) is a multinomial distribution that represents the probability that a user posts a recommendation article on restaurant. sj under class zk. The text vector d is calculated as the product of all conditional binomial probabilities of words wi, where P ( w i | z k ) is the probability of word wi to appear in the class zk and P ( w ¯ i | z k ) is the probability of word wi to not appear. That is, P ( w i | z k ) + P ( w ¯ i | z k ) = 1 is satisfied.

    Figure 3 shows the graphical model used in Step 2 of the proposed method.

    3.3.2 Learning Parameters using the Expectation-Maximization Algorithm

    The parameters in the proposed model P(zk), P ( s j | z k ) , P ( w i | z k ) , μ k , and σ k 2 are estimated using the Expectation-Maximization Algorithm (EM algorithm). The EM algorithm (Dempster et al., 1977;McLachlan and Krishnan, 2007) is a method of estimating parameters using an iterative procedure by locally maximizing the likelihood when the probability model depends on non-observable variables. The EM algorithm consists of two steps: the expectation step (E-step) and the maximization step (M-step), and iterates these steps until the logarithmic-likelihood function LL converges.

    Here, in the recommendation article n, the basic information of the article is defined as xn, residual rn of the regression model F for the reaction number yn is denoted by y n y ^ n , recommended restaurant is defined as a n S , and text vector is denoted by dn. Then, the logarithmic-likelihood of the given data is described as follows:

    L L = n = 1 N log P ( r n , a n , d n )

    First, the E-step of the EM algorithm is formulated below:


    P ( z k | r n , a n , d n ) = P ( z k , r n , a n , d n ) k = 1 K P ( z k , r n , a n , d n ) = γ n k

    Then, based on Jensen’s inequality, we introduce a function LL′ that is always smaller than LL.

    L L = n = 1 N log P ( r n , a n , d n ) = n = 1 N log k = 1 K P ( z k , r n , a n , d n ) = n = 1 N log k = 1 K γ n k P ( z k , r n , a n , d n ) γ n k n = 1 N k = 1 K γ n k log P ( z k , r n , a n , d n ) γ n k = n = 1 N k = 1 K γ n k log { log P ( z k ) + log P ( r n | z k ) + logP ( a n | z k ) + logP ( d n | z k ) log γ n k } = n = 1 N   k = 1 K γ n k { log P ( z k ) + log P ( r n | z k ) + j J δ ( a n = s j ) log P ( s j | z k ) + i = 1 I { d n w i log P ( w i | z k ) + ( 1 d n w i ) log P ( w ^ i | z k ) } log γ n k } = L L

    where δ ( C ) is an indicator function. If the argument is true, δ ( C ) returns 1; otherwise, returns 0. The parame-ters P ( z k ) , P ( s j | z k ) , P ( w i | z k ) , μ k , and σ k 2 are estimated such that the value LL′ is maximized and each con-struction is satisfied.

    k = 1 K P ( z k ) = 1

    j = 1 J P ( s j | z k ) = 1 for z k Z

    P ( w i | z k ) + P ( w ¯ i | z k ) = 1 f o r   w i V ,         z k Z

    Let η , ι k and κ k , i be the Lagrangian undetermined multi-pliers. We define the Lagrangian function g as follows:

    g = L L η { k = 1 K P ( Z k ) 1 } k = 1 K ι k { j = 1 J P ( s j | z k ) 1 } k = 1 K i = 1 I κ k , i { P ( w i | Z k ) + P ( w ¯ i | Z k ) 1 }

    Then, let the value of the differentiation of g by P ( z k ) , P ( s j | z k ) , P ( w i | z k ) , μ k , and σ k 2 be 0. We update the estimations in the M-step as follows.


    P ( z k ) = 1 N n = 1 N   γ n k

    P ( s j | z k ) = 1 N P ( z k ) n = 1 N δ ( a n = s j )   γ n k

    P ( w i | z k ) = 1 N P ( z k ) n = 1 N   d n w i   γ n k

    μ k = 1 N P ( z k ) n = 1 N   r n   γ n k

    σ k 2 = 1 N P ( z k ) n = 1 N ( r n μ k ) 2   γ n k

    The E-step and M-step are repeated until the logarithmic-likelihood function LL converges, and then we obtain the estimations of P ( z k ) , P ( s j | z k ) , P ( w i | z k ) , μ k , and σ k 2 .


    To verify the effectiveness of our proposed model, we demonstrate the analysis of practical data of the restaurant guide Retty.

    4.1 Data Set and Analysis Conditions

    In this analysis, we analyze recommendation arti-cle data, restaurant data, and user data stored on Retty in March and April 2016. We restricted the target data only to articles that were public as of July 2017 with more than 50 letters. Approximately 60,000 articles were covered in each month.

    First, in Step 1, we construct a function F that predicts the number of reactions from the basic infor-mation of the recommendation article by using the data of March. As the basic information, we used three vari-ables of “Recommending Degree,” “Number of Images,” and “Number of Followers”. As a prediction model F, we use the random forest regressor (Breiman, 2001) because of its good predictive performance and usability.

    Next, we apply the model F to the April data and calculate the predicted value of the number of reactions. Then, we obtain the residuals. In Step 2, we learn the proposed latent class model using these residual values, restaurant, and recommendation sentences. Now, the target words consist of nouns, verbs, and adjectives that appeared in more than 30 recommendation articles in April, and the vocabulary size I was 6,513. In addition, we used a morphological analysis tool2 and used a dic-tionary that has the advantage of compound words and new words . In addition, instead of individual restau-rants, we use the category of each restaurant defined by the service operating company, and the number of cat-egories of restaurants is J = 213. Furthermore, based on the previous experiments, the number of latent classes was set to 14 by using Akaike Information Criterion (AIC).

    4.2 Result of Step 1

    Table 1 shows the prediction result of the root mean squared error in Step 1. In addition, Table 2 shows the normalized feature importance calculated by using the random forest algorithm.

    From Table 2, we can see that the number of fol-lowers affects the number of reactions. The recommen-dation degree has a small impact on the number of re-actions. The reason is considered as follows: the charac-teristic of the target restaurant guide is that the score of each article tends to be high. Moreover, although the recommendation degree is low, the policy introducing recommended restaurants suggests that the recommen-dation degree does not affect significantly of increasing the number of reactions.

    4.3 Result of Step 2

    Table 3 lists the learned parameters. For restaurants and words, we introduce types by examining the class membership probabilities P ( z k | s j ) and P ( z k | w i ) . For example, if words with the highest scores are pizza, spa-ghetti, etc., then the word type is “food” To confirm the estimated parameters, we conducted a statistical test of the difference of the estimated means μz between the latent classes. We assumed the result of each class to be the result of one-way ANOVA having 14 levels. The test showed a statistical significance μz between the levels based on the variance analysis.

    From Table 3, the probability of occurrence of the latent class zk is almost unbiased, and it turns out that there are no classes that have much data. The average values of the residuals μk differ between the classes. Moreover, by studying the relationship between the change in the values of the residuals, restaurants, and words, we can find some tendencies.

    First, in the classes z1 and z2, where the average residuals are relatively high, characteristic words are not of the “food” type; therefore, words are considered to represent other factors (appearance inside the restau-rant, situation, etc.). On the other hand, in the classes from z11 to z13, where the average residuals are rela-tively low, characteristic words are of the “foods” type. Thus, it is suggested that users wrote better not only about foods, but also about other elements, leading to a better recommendation article.

    Next, we focus on the latent classes z7, z10 and z14. The characteristic words of these classes are not associated with restaurants at a glance. By investigating the articles in which these words appear, it was con-firmed that the title of the personal blog is within the post. In other words, some users copy the contents writ-ten in their personal blog and paste them on the target restaurant guide site as a recommendation article. Con-sidering the fact that the mean parameters μk of these classes z7, z10 and z14 are negative, it can be pointed out that articles containing parts from personal blogs are not preferred in terms of the reactions (i.e., empa-thy).

    Note that there are some classes where the interpre-tation of the restaurant type and the word type are the same, but the value μz of one class is positive and another is negative (e.g., z3 and z10). This is because restaurants can have different characteristics even if they belong to the same category. In this study, we ap-plied the common regression model to all data and gen-erated the latent classes by applying the PLSA model to the residuals of each data, restaurant, and text infor-mation. Articles with many reactions and articles with a small number of reactions have different statistical characteristics despite of belonging to the same catego-ry or having the same text content, the statistical char-acteristics of these restaurants are different, although they belong to the same category. In this case, multiple latent classes can be constructed for the same restau-rant type and word type, but each value of the residual is different.

    Here, we focus on the category of each restaurant. Overall, it appears easier for ordinary restaurants to obtain reactions. Than for restaurants that are some-what expensive or unfamiliar.

    As described above, the proposed model enables the analysis of the influence of the recommendation articles on the reactions of other users and new knowledge acquirement from the learning results.


    5.1 Approach for the Model Construction

    In our approach, the learning procedure for analyz-ing the relationship between the posted articles and the number of reactions is divided into two steps: the least square estimation for the regression model and the EM algorithm for the latent class model. Moreover, we adopted the data division for learning both the regres-sion model and latent class model. In terms of the re-gression model, the model assumed implicitly in this study can be described as follows:

    y = F ( x 1 , x 2 , , x p ) + ξ z + ε

    Here, xj are the explanatory variables, F ( x 1 , x 2 , , x p ) is an appropriate regression model, ε is the error term ε ~ N ( 0 , σ z 2 ) , and the effect ξz is the term depending on the text information of the recommendation articles. In this assumption, the average of the error is 0 for every class; however, the variance of the error σ z 2 is different depending on the latent class z, and ξz is the effect of the latent class z that is determined by the corre-sponding recommendation article of a restaurant. The residual of the first step of the regression analysis is an estimator of ξ z + ε , which includes the error term ε. We assume that if the term ξz is positive, it is a good rec-ommendation article, and if the term ξz is negative, the article is considered to have no more reactions than the baseline. Although we cannot know the true values of the term ξz, it can be assumed that the dual is a good estimator when the error term ε has a zero mean and a relatively small variance.

    Then, the distribution of the estimated error ε ^ i for a statistical model validation including a residual diag-nostic of the error ε was determined. We discussed the adequacy of the estimated model using the residual plot. Here, in the proposed model, each data is considered to belong to each class probabilistically; however, as the residual plot is complicated, we classified each sample data into the most likely latent class. That is, we applied the hard-clustering approach to the data and checked the distribution of the estimated errors for each latent class. We presented histograms of the estimated errors for each class by calculating the residuals using Equa-tion (18). The results of each class mentioned in 4.3 z 1 , z 2 , z 7 , z 10 , z 11 , z 12 , z 13 , and z 14 ) are shown in Figurs 4-11.

    As shown in the results, none of the residual distri-bution shapes are extremely asymmetrical, and we cannot observe abnormalities. Therefore, we can sug-gest the adequacy of the result. From these figures, the distributions are unimodal. It seems reasonable to as-sume normal distributions for the error distributions for each latent class.

    Here, we discuss the model learning approach. Considering the model construction of the number of reactions, a straightforward approach is to use the basic information and the information of texts and restau-rants together. In other words, the fundamental ap-proach is to analyze the data based on only one regres-sion model using all variables as explanatory variables. However, the number of types of words in the texts is relatively large when considering the number of learning data. In fact, the number of texts is about 60,000, and the number of the types of words is 6,500. Therefore, if we analyze the data based on only one regression mod-el with 6,500 explanatory variables, then the number of parameters becomes large and the model estimation may be unstable. Therefore, the approach of analyzing the data using the regression model and the latent class model should be appropriate.

    Next, we discuss the division of the data into two parts, as shown in Figure 2. The usual approach is to learn the regression model using the whole data; howev-er, the overfitting problem can occur for the residuals. That is, the residual may be underestimated. Adding a regularization term such as Lasso and Ridge (Zou and Hastie, 2005) is one of the solutions of the overfitting problem. When we adopt the regularization approach, we need to search for the appropriate value of the regu-larization parameter. In general, approaches such as cross-validation (Zou and Hastie, 2005) can be adopt-ed; however, the computational amount is enormous. In addition, the estimation accuracy of residuals is more important than the parameter estimation accuracy in the case of this study.

    Therefore, dividing the data into two parts (i.e., the data for learning the regression analysis and the data for learning the latent class model) can be considered the simplest and most efficient. When estimating the residuals of the data that are not used for the parameter estimation, these are not overestimated and are useful for the next step of learning the latent class model.

    5.2 Application of the Latent Class Model

    As shown in Table 3, there are several latent classes where the estimated value of the residual σ k 2 is large. As the model handles the occurrence probability and residuals equally, as in Figure 3, it does not estimate the residual intensively. As the dimension of the document vector is relatively large, the fitting to a document vec-tor tends to be emphasized. Owing to the maximum likelihood estimation, latent classes with large differ-ences in residuals are not obtained.

    The approach proposed by Sakamoto et al. (2017) can be efficient for solving the problem above. By ap-plying this approach, the E-step of the EM algorithm is formulated as follows:

    P ' ( z k | r n , a n , d n ) = P ' ( z k , r n , a n , d n ) k = 1 K P ' ( z k , r n , a n , d n )   = P ( z k ) P ( r n | z k ) α P ( a n | z k ) β P ( d n | z k ) γ k = 1 K P ( z k ) P ( r n | z k ) α P ( a n | z k ) β P ( d n | z k ) γ  

    Here, α , β and γ are decided in advance. However, the decision of the parameters is not easy; therefore, an analysis method for the efficient parame-ter setting is required for real data analysis.

    5.3 Analysis of the Text Data with other Variables

    In this study, we analyzed text data using restau-rant data. Many studies have analyzed not only text data, but also other variables. For example, Akita et al. (2016) predicted the stock price by considering not only information about the stocks, but also text data. In ad-dition, Park et al. (2016) analyzed the motivation of cruising based on Twitter data and cruising information. Our study can be regarded as one instance of such stud-ies analyzing text information along with other varia-bles.


    In this study, we used the number of reactions from other users as an indicator of “goodness” in order to determine what makes a recommendation article good on a restaurant guide, and proposed a model for relational analysis of articles and reactions. We modeled basic information, sentences, and restaurants hierarchi-cally, and analyzed the relationship by using the latent class model.

    In the application to actual data in Step 1, the re-sults showed that the influence of the number of fol-lowers on the number of reactions was large. Further-more, in Step 2, we found that the number of reactions differed from the baseline for different types of words and restaurants, and captured trends. Several methods to increase the reactions could be pointed out from the results of the analysis. For example, it is better to write not only about food, but also about other factors. From the above, we demonstrated the effectiveness of the proposed method.

    For the future work, it is necessary to verify and improve the estimation accuracy of the number of reac-tions. In this study, the estimated parameters are con-vincing; however, the prediction accuracy is not as good. That is, the variance of the residuals is still large in the parameter estimation. If this variance can be reduced, the reliability of the analysis result will increase.


    The authors would like to express their gratitude to Retty Inc. for providing valuable data and enthusiastic support of our research. A part of this study was sup-ported by JSPS KAKENHI Grant Numbers 26282090 and 26560167.



    Conceptual image of the proposed approach.


    Procedure of the learning model.


    Graphical model of the proposed model.


    Residual Plot of z1.


    Residual Plot of z2.


    Residual Plot of z7.


    Residual Plot of z10.


    Residual Plot of z11.


    Residual Plot of z12.


    Residual Plot of z13.


    Residual Plot of z14.


    Root mean squared error (RMSE) (training data: March)

    Feature importance

    Parameter learning result (order by μk and the class number is re-assigned)


    1. Akita, R. , Yoshihara, A. , Matsubara, T. , and Uehara, K. (2016), Deep learning for stock prediction using numerical and textual information, Proceedings of the IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 1-6.
    2. Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer, New York, 98-108.
    3. Blei, D. M. , Ng, A. Y. , and Jordan, M. I. (2003), Latent dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022.
    4. Breiman, L. (2001), Random forests, Machine Learning, 45, 5-32.
    5. Dempster, A. P. , Laird, N. M. , and Rubin, D. B. (1977), Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (methodological), 39(1), 1-38.
    6. Goto, M. , Mikawa, K. , Hirasawa, S. , Kobayashi, M. , Suko, T. , and Horii, S. (2015), A new latent class model for analysis of purchasing and browsing histories on EC sites, Industrial Engineering & Management Systems, 14(4), 335-346.
    7. Green, P. E. , Carmone, F. J. , and Wachspress, D. P. (1976), Consumer segmentation via latent class analysis, Journal of Consumer Research, 3(3), 170-174.
    8. Hagenaars, J. A. and McCutcheon, A. L. (2002), Applied Latent Class Analysis, Cambridge University Press, New York.
    9. Hofmann, T. (1999), Probabilistic latent semantic analysis, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 289-296.
    10. Hofmann, T. (2004), Latent semantic models for collaborative filtering, ACM Transactions on Information Systems (TOIS), 22(1), 89-115.
    11. Iwata, T. , Watanabe, S. , Yamada, T. , and Ueda, N. (2009), Topic tracking model for analyzing consumer purchase behavior, Proceedings of the 21st International Conference on Artificial Intelligence (IJCAI’09), 1427-1432.
    12. Jin, R. , Si, L. , and Zhai, C. (2006), A study of mixture models for collaborative filtering, Journal of Information Retrieval, 9(3), 357-382.
    13. Jin, R. , Si, L. , and Zhai, C. X. (2003), Preference-based graphic models for collaborative filtering, Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI’03), 329-336.
    14. Kang, H. , Yoo, S. J. , and Han, D. (2012), Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews, Expert Systems with Applications, 39(5), 6000-6010.
    15. Magidson, J. and Vermunt, J. K. (2002), Latent class models for clustering: A comparison with K-means, Canadian Journal of Marketing Research, 20(1), 37- 44.
    16. McLachlan, G. and Krishnan, T. (2007), The EM Algorithm and Extensions (2nd), Wiley-Interscience.
    17. Mochizuki, R. , Watanabe, T. , Namikawa, D. Tanaka, K. , and Yamada, T. (2013), Evaluation about personalized metaphor agent system, Proceedings of the 12th Forum on Information Technology (FIT2013), 137-144.
    18. Nigam, K. , McCallum, A. K. , Thrun, S. , and Mitchell, T. (2000), Text classification from labeled and unlabeled documents using EM, Machine learning, 39 (2-3), 103-134.
    19. Pantelidis, I. S. (2010), Electronic meal experience: A content analysis of online restaurant comments, Cornell Hospitality Quarterly, 51(4), 483-491.
    20. Park, S. B. , Ok, C. M. , and Chae, B. K. (2016), Using Twitter data for cruise tourism marketing and research, Journal of Travel & Tourism Marketing, 33(6), 885-898.
    21. Sakamoto, T. , Yamashita, H. , Ogiwara, T. , and Goto, M. (2017), A latent class model to analyze the relationship between companies’ appeal points and students’ reasons for application, Journal of Information Processing, 58(9), 1535-1548.
    22. Si, L. and Jin, R. (2003), Flexible mixture model for collaborative filtering, Proceedings of the 20th International Conference on Machine Learning (ICML’03), Washington DC, 704-711.
    23. Suzuki, T. , Kumoi, G. , Mikawa, K. , and Goto, M. (2014), A design of recommendation based on flexible mixture model considering purchasing interest and postpurchase satisfaction, Journal of Japan Industrial Management Association, 64(4E), 570-578.
    24. Swait, J. and Adnmowicz, W. (2001), The influence of task complexity on consumer choice: A latent class model of decision strategy switching, Journal of Consumer Research, 28(1), 135-148.
    25. Train, K. E. (2009), Discrete Choice Methods with Simulation (2nd ed), Cambridge University Press, New York.
    26. Yamamoto, Y. , Mikawa, K. , and Goto, M. (2017), A proposal for classification of document data with unobserved categories considering latent topics, Industrial Engineering & Management Systems, 16(2), 165- 174.
    27. Zhang, Z. , Zhang, Z. , Wang, F. , Law, R. , and Li, D. (2013), Factors influencing the effectiveness of online group buying in the restaurant industry, International Journal of Hospitality Management, 35, 237- 245.
    28. Zou, H. and Hastie, T. (2005), Regularization and variable selection via the elastic net, Statistical Methodology, 67(2), 301-320.