• Editorial Board +
• For Contributors +
• Journal Search +
Journal Search Engine
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.18 No.3 pp.292-304
DOI : https://doi.org/10.7232/iems.2019.18.3.292

# An Analytic Model to Represent Relation between Finish Date of Job-Hunting and Time-Series Variation of Entry Tendencies

Seiya Nagamori, Kenta Mikawa, Masayuki Goto*, Tairiku Ogihara
Graduate School of Creative Science and Engineering, Waseda University, Tokyo, Japan
Department of Information Science, Shonan Institute of Technology, Kanagawa, Japan
School of Creative Science and Engineering, Waseda University, Tokyo, Japan
Customer Action Group, Media Planning Division, Field Planning Unit, Graduate Recruitment Business Development, Recruit Career Co. Ltd., Tokyo, Japan
Corresponding Author, E-mail: masagoto@waseda.jp
June 10, 2016 June 18, 2017 November 15, 2018

## ABSTRACT

Currently, most university students in Japan use Internet portal sites for job-hunting activities. However, job-hunting activities are sometimes prolonged owing to a mismatch between a student and the company requirements. To solve this problem, it is important to find the students who may not be able to finish job-hunting early; this goal can be achieved by utilizing user behavior log data stored on an Internet portal site. This study proposes appropriate statistical model based on a latent class model. Specifically, we also apply clustering approach and takes account of timeseries variation. The proposed model enables us to analyze entry patterns from the viewpoint of time-series variation of job-hunting activities and to predict the finish date of job-hunting for each cluster. Through the simulation experiments, the effectiveness of the proposed method was clarified. We used actual data of students’ activities from an Internet portal site to demonstrate the effectiveness of the proposed method that considers the time series of the entry tendency of student users. By considering the time shift of students’ preferences, it became possible to extract students who tend to struggle in job-hunting activities. It is possible to specify students who should be supported by using the proposed model.

## 1. INTRODUCTION

In recent years, most university students (users) in Japan have been using Internet portal sites for their job hunting activities. Various types of information related to job-hunting are available on Internet portal sites, and student users can search this information for interesting jobs and companies. The use of an Internet portal site enables student users to easily apply to employment examinations of several companies. However, sometimes, job-hunting activities are prolonged owing to a mismatch between a student user and the company requirements. In order to solve this problem, it is desirable to use the activity data stored on an Internet portal site to find a group of student users who may not be able to finish job-hunting early. Lengthening of job-hunting activities is one of the social problems in Japan, so its countermeasure is an important topic. By utilizing the large-scale data accumulated in an Internet portal site for job-hunting, it can be expected to present a solution to this problem.

In general, student users have a wide variety of different preferences, so the latent class model which is also called the mixed model (Bishop, 2006;Train, 2009) is considered to be effective to model their behaviors. Hayakawa et al. (2013) proposed a predictive model of the job-hunting finish date of student by using student demographic information. Hayakawa et al. (2013) observed that the job-hunting finish date of students strongly depends on their demographic information; they constructed the model based on a stratification tree and mixed Weibull distributions. Yamagami et al. (2014) proposed the probabilistic latent class model that can describe the relationship among demographic information, action log data, and the job-hunting finish date of students; the authors showed the effectiveness of their proposal from the viewpoint of prediction accuracy. However, many students tend to change their choice of appropriate companies and jobs during their job-hunting activities. According to the observations of several specialists working for a company operating an Internet portal site, it is important to model the change in student user behavior related to entry tendencies. Student user behavior related to entry tendencies exhibits various patterns. For example, some student users continue to apply to employment examinations of companies in the same industry category, whereas other student users change their entry tendency with the passage of time by re-evaluating their aptitude during their job-hunting activities. Such a time-series variation of entry tendencies appears to affect the expected finish date of job-hunting. That is, there is a hypothesis that there might be a statistical relationship between the timeseries variation of users’ preference and the finish date of job-hunting.1) If this hypothesis is correct, a model expressing this relationship can be constructed from the data of the entry history and the finish date of job-hunting. Therefore, even in the absence of a strong statistical relation between the entry activity and the finish date of student users, it would be significant to build an analytic model representing the moderate relation and create an effective action plan for a group of student users whose job-hunting duration is likely to be prolonged.

In this study, we attempt to build an appropriate model expressing the relation between the entry tendency and the finish date from a global point of view. Specifically, we propose the method of student user clustering based on a latent class model representing the time-series variation of entry tendencies. The proposed model enables us to analyze the entry patterns from the viewpoint of timeseries variation of job-hunting activities. By applying the proposed method, we can analyze the relation between the time shift of entry tendencies and the finish date of job-hunting. Additionally, it is possible to predict the finish date of job-hunting for each cluster. By using the proposed method, it becomes possible to find the user group that is predicted to be late for job-hunting at an early stage. In order to verify the effectiveness of the proposed method, we demonstrate the data analysis and processing by using actual data from an Internet portal site. Further, we show that it is possible to analyze the characteristics of student users who are attached to a formed cluster.

## 2. PRELIMINARIES

### 2.1 Job-Hunting by University Students in Japan

In this section, the system of job-hunting in Japan is introduced as the basis of our study. The style of job hunting activities in Japan is unique in the world. Most Japanese students start job-hunting activities during their student life and begin working immediately after their graduation from educational institutions such as universities or graduate schools. Therefore, students have to conduct job-hunting activities while studying in their universities. The schedule for job-hunting is decided by the Federation of Economic Organizations; therefore, the job hunting information for almost all companies is published at the same time.

In Table 1, we present the job-hunting schedule of Japanese students who graduated in March 2015. In addition, the schedule of students who will graduate in March 2016 is shown. As described earlier, the basic schedule of job-hunting is determined by the Federation of Economic Organizations.

In Japan, most students generally use Internet portal sites for job-hunting. Numerous Internet portal sites for job-hunting exist, and many student users use multiple sites simultaneously. Typically, an Internet portal site for job-hunting provides a comprehensive service from the start to the finish of job-hunting activities; the stages in the service include the entry for an internship, the reservation of a briefing session, the entry for the employment examination of a company, and the self-analysis and research for various industries.

The difficulty of predicting finish date of that is caused by the existence of various other factors that can affect the finish date of each student user, e.g., meetings with senior company employees who have graduated from the same university, attendance in information sessions held by companies, company search activities on the Internet, and the daily lifestyle of the student users, including the supporting activities at their university. In addition, many student users do not use only a specific Internet portal site; they use multiple portal sites simultaneously. Internet portal sites for job-hunting cannot obtain all such information related to student user activities. Therefore, it may not be possible to accurately predict the finish dates of students’ job-hunting activities with only the action history on an internet portal site for job-hunting. However, it is natural to assume that there exists a relation between the time shift of entry-pattern tendency and the finish date of student user job-hunting activity. If this hypothesis is correct, a model expressing this relationship can be constructed from the data of the entry history and the finish date of job-hunting. Therefore, even in the absence of a strong statistical relation between the entry activity and the finish date of student users, it would be significant to build an analytic model representing the moderate relation and create an effective action plan for a group of student users whose job-hunting duration is likely to be prolonged.

### 2.2 The Latent Class Models

In this study, we quantify and analyze entry histories of student users by using a latent class model (Gibson, 1959;Lazarsfeld and Henry, 1968;Goodman, 1974;Hofmann and Puzicha, 1999;Hofmann, 2001;Magidson and Vermunt, 2002;Hofmann, 2004;Hagenaars and McCutcheon, 2009;Collins and Lanza, 2013). Various latent class models have been proposed. In this section, we discuss the significance of using a latent class model and introduce some variation.

#### 2.2.1 The Effectiveness of Latent Class Models

Latent class models assume the existence of unobservable latent variables behind the observable training data. For example, when the latent class model is applied to purchase history data, it is possible to express differences of purchase probabilities for each item between users according to unobservable latent factors that express user preference for items (Matsuzaki et al., 2015;Fujiwara et al., 2017). Further, when the latent class model is applied to document data, it is possible to observe that each document and each word arise from unobservable latent topics that documents potentially retain (Hofmann, 1999b).

#### 2.2.2 Previous Studies of Latent Class Models

In recent times, various types of latent class models have been proposed and widely used (Greene and Hensher, 2003;Hofmann, 1999a;Si and Jin, 2003;Jin et al., 2006;Zhang, 2004). For example, Zhang (2004) proposed the hierarchical latent class models. Dillon and Mulani (1984) presented a probabilistic latent class model for assessing inter-judge reliability. In the field of marketing science, latent classes are convenient and effective for representing customer segments and many applied models have been studied (Madden and Dillon, 1982;Swait, 1994;Train, 2009). Langseth and Nielsen proposed a latent model based on a linear Gaussian Bayesian network (Langseth and Nielsen, 2012).

Especially in the field of the recommender system, the various types of latent class models, for example, Gaussian PLSA (Hofmann, 2003), FMM (Si and Jin, 2003;Suzuki et al., 2014), Joint Mixture Model (JMM) and Decoupled Model (DM) (Jin et al., 2006), have been proposed. However, many of them are focusing on the prediction problem of the user’s evaluation value for each item, which is called rating. In this paper, we focus on the problem to represent the relation between finish date of job-hunting and time-series variation of entry tendencies. We should construct a model to represent time-series variation of entry tendencies and a most reasonable model is the aspect model. Many other models cannot be applied to represent the time-series variation of entry tendencies because these models focus on rating prediction.

The aspect model is one of the well-known latent class models (Hofmann and Puzicha, 1999;Goto et al., 2014, 2015) and can be used as a clustering method or a dimension-reduction technique. This model, which originated in the field of information retrieval, focuses on the co-occurrence of the document and the word. It is assumed that the document and the word are occurring stochastically from the latent class. The aspect model is widely used in various fields and is extended in accordance with the purpose. For example, with the progress of web services, the aspect model has been applied to recommendation systems for many electronic commerce (EC) sites. This system recommends items that match user preferences. In the model-based approach for the recommender system, some models utilizing the user’s rating, have been developed as variations of the aspect model. When this model is applied to purchase data with user rating for items, the evaluation values of the users can be predicted for each item. In this model, the cooccurrence of three variables—the user, the items, and the ratings—is the point of focus, and multinomial distributions are assumed for a probabilistic model of ratings. Based on the assumption that a rating follows a normal distribution, this model is extended to develop a model called Gaussian Probabilistic Latent Semantic Analysis (g-PLSA) (Hofmann, 2003). In the g-PLSA model, various normal distributions of the evaluation point occur from the individual latent class. The g-PLSA model differs from the above model with respect to the number of parameters estimated. Owing to the assumption of normal distribution, the number of estimated parameters decreases. Therefore, g-PLSA is easy to use with respect to parameter estimation and result analysis. In this field, an accurate evaluation value predicted using these models has become an important topic.

Various other types of latent class models exist. The flexible mixture model (FMM) considers different latent classes for users and items (Si and Jin, 2003;Suzuki et al., 2014). Another latent class model considers browsing history and other factors (Fujiwara et al., 2014;Goto et al., 2015). Based on the application and purpose, analysts use the appropriate model. As described above, either of the following two approaches is actively taken: 1) increasing the number of variables to be handled in order to enhance the expression ability of the model (Yamagami et al., 2015); 2) applying suitable distributions to each variable by considering the data characteristic (Yang et al., 2015). These approaches are typical extensions of the latent class model.

In this study, we introduce the aspect model to analyze the relation between the entry tendency of student users and the finish date of job-hunting. We propose a method that quantifies the time-series variation of entry tendencies by utilizing the aspect model. In the next subsection, we define the aspect model in detail.

#### 2.2.3 Aspect Model

In this study, we apply the aspect model (Hofmann, 1999a, 1999b) in the finish date prediction in order to quantify the entry tendencies of student users. The aspect model is a statistical model that assumes the existence of a discrete latent class between the student users and the companies. The users and the companies are divided into clusters (latent classes) stochastically. Here, the event that the student user yj applies to the company xi is denoted by (yj, xi). A set of companies is defined as $X = { x i : 1 ≤ i ≤ I } ,$ a set of student users is defined as $Y = { y j : 1 ≤ j ≤ J } ,$ and a set of latent classes is defined as $Z = { z k : 1 ≤ k ≤ K } .$ In this case, the aspect model can be stochastically represented by equation (1).

$P ( x i , y j ) = ∑ k P ( z k ) P ( x i | z k ) P ( y j | z k )$
(1)

Here P(xi, yj) is the joint probability of the company xi and the student user $y j , P ( z k ) y j$, is the probability of the latent class $z k , P ( x i | z k ) z k$, is the conditional probability of the company xi given the latent class zk, and $P ( y j | z k )$ is the conditional probability of the student user yj given the latent class zk. In equation (1), the parameters P(zk), $P ( x i | z k )$, and $P ( y j | z k )$ are estimated by the expectation– maximization (EM) algorithm (Dempster et al., 1977).

#### 2.2.4 Use of Latent Class Model for Job-Hunting Data

In order to apply the aspect model to the job-hunting activity of student users, we can estimate their entry probability for each company.

In particular, the application of a latent class model to student user entries appears to be effective because, typically, we can assume that different types of student users co-exist. A latent class represents a group of student users with similar characteristics and cannot be considered as the data. Additionally, the use of latent class models has various advantages. Latent class models facilitate the analysis of stored data on an Internet portal site owing to the dimension reduction of sparse highdimensional data by the assumption of a latent class. Assuming latent classes is equivalent to clustering student users and companies in the number of latent classes simultaneously. This model enables us to clarify the relation between student users and companies from the viewpoint of entry activities. In this study, we attempt to utilize the aspect model to quantify the time shift of entry tendencies.

This study differs from extensions such as the models shown in section 2.2.2. We show that it is possible to perform a more effective quantification by using the estimated parameters of the aspect model. The ideas in this study can be utilized not only for the aspect model but also for various latent class models. This feature is one of the advantages of this study with respect to scalability. Figure 1

## 3. PROPOSED METHOD

In order to find groups of student users whose job hunting duration is likely to be prolonged and to analyze the statistical characteristics of each group, we construct a model that expresses the relation between the time-shift pattern of entry tendencies and the finish dates of student users by considering the time series. Simultaneously, we can analyze the pattern of entry tendencies by utilizing this model. Here, owing to the large number of companies, it is necessary to aggregate the entry data of applied companies based on student users and to treat similar companies collectively from the statistical viewpoint. Therefore, we attempt to quantify the time shift of entry tendencies by using the aspect model, which is one of the effective latent class models. Then, student user clustering can be performed by considering the time shift of entry tendencies. We can use the belonging probabilities for the latent classes to quantify the entry tendencies and to easily calculate them by performing time-series division.

The proposed method consists of the quantification of the entry tendency of student users by using the aspect model, student user clustering by using the k-means method, estimation of the prediction target user cluster based on the similarity calculation, and prediction of the finish date of each cluster. The proposed clustering method is performed as follows:

[Steps in proposed method]

• Step 1. Learning the aspect model by using a set of training data.

• Step 2. Quantifying the entry tendencies by considering the time series of student users in the training data based on the learned aspect model.

• Step 3. Clustering the student users in the training data by applying the k-means method to the time-series data of student user entry tendencies.

• Step 4. Quantifying the entry tendencies by considering the time series for the prediction target users based on the learned aspect model.

• Step 5. Estimating the best cluster for the prediction target users based on the similarities between each cluster and quantified entry tendencies of prediction target users.

• Step 6. Predicting the finish date for the prediction target users by using the average values in each cluster.

Here, the set of student users in the training data plays a role in the construction of the aspect model (the estimation of aspect model parameters). In addition, these student users also play a role in the formation of clusters used to predict the finish dates of the prediction target users. By using the training data as described above, we can estimate the cluster for each prediction target user.

### 3.1 Quantification of Entry Tendencies by the Aspect Model

During job-hunting, each student user applies to the employment examinations of companies that he/she prefers. Then, it is assumed that the student user preferences for companies can be represented by belonging probabilities for the latent classes. In order to consider the time series of the entry tendencies, we calculate the entry tendencies divided into arbitrary T periods, allowing duplication. Letting $P ′ t ( z k | y j )$ be the belonging probability of the student user yj to the latent class zk at the time t, the entry tendencies of student users are defined by equation (2). This equation represents the entry tendency of student user yj at period t.

$P ′ t ( z k | y j ) = 1 N j t ∑ i η i j t P ^ ( z k | x i )$
(2)

$η i j t = { 1 0 ( entry of x i by y j at period t ) ( otherwise )$
(3)

Here, t (t = 1, 2, …, T) denotes the number of time periods, and Njt represents the total number of entries made by student user yj at period t. ηijt is the indicator function; it takes 1 if student user yj enters company xi at period t, otherwise it takes 0. The information of the company xi, the student user yj, and the period t are included in the variable ηijt. Further, $P ^ ( z k | x i )$, which represents the belonging probability for the latent class zk for each company, is estimated by using equation (4).

(4)

Here, $P ^ ( z k )$ and $P ^ ( x i | z k )$ means the estimated probabilities of P(zk) and $P ( x i | z k )$ respectively. The parameters of equation (4), i.e., $P ^ ( z k )$ and $P ^ ( x i | z k )$, are estimated by the EM algorithm (Dempster et al., 1977;McLachlan and Krishnan, 1997). The student users have stochastic preferences for each latent class, and the sum of all the preferences of each student user for each latent class is equal to 1. This probability represents the entry tendency for the student user. We focus on the changes of belonging probabilities to latent classes with time t. Therefore, it can be used at an arbitrary divided period for calculation of the entry tendency. In this model, the entry tendencies are calculated by using a hierarchical structure in order to consider the time series effectively. If the belonging probabilities to latent classes are drastically changed by t, it means that his/her preferences to companies are also changed. The method to calculate the entry tendencies is shown in Figure 2.

### 3.2 Learning Student User Clustering by K-Means Method

In order to create clusters for the analysis of the relation between the entry tendency of student users and the finish date of job-hunting, we perform clustering for all the student users in the training data set. For clustering, the elements of each student user consist of entry tendencies that are calculated by equation (2) for T periods in order to consider the time series. A feature value for a student user wj in the training data set is expressed by equation (5).

$w j = ( s j 1 , s j 2 , … , s j T )$
(5)

where sjt is a K-dimensional vector representing the entry tendency of student user yj at period t. sjt is represented by equation (6).

$s j t = ( P ′ t ( z 1 | y j ) , P ′ t ( z 2 | y j ) , … , P ′ t ( z K | y j ) )$
(6)

In this study, we applied the k-means method as a clustering method. The k-means method is one of the basic and effective clustering tools. The number of clusters is denoted by C; when l is defined as a cluster number variable, the representative vector cl(l = 1, 2, …, C) for each cluster is obtained by equation (7).

(7)

(8)

where Dl represents the number of student users belonging to cluster l.

### 3.3 Prediction of Target Users Clustering by Similarity Calculation

We also calculate the entry tendency of each period for the prediction target users by using equation (2). When we use equation (2) to calculate entry tendencies for the prediction target users, the belonging probabilities for the latent class zk of company xi, $x i , P ^ ( z k | x i ) x i$, can be calculated by directly using the same probabilities that were estimated in the learning phase. The reason for the usage of this method is that the list of companies to which student users can apply does not change every year on an Internet portal site. Here, the set of prediction target users is defined as $Y ′ = { y ′ m : 1 ≤ m ≤ M }$ , and the feature vector $w ′ m$ is calculated for each prediction target user by equation (9).

$w ′ m = ( s ′ m 1 , s ′ m 2 , … , s ′ m T )$
(9)

$s ′ m t = ( P ′ t ( z 1 | y ′ m ) , P ′ t ( z 2 | y ′ m ) , … , P ′ t ( z K | y ′ m ) )$
(10)

As the same, $s ′ m t$ is a K-dimensional vector representing the entry tendency of student user $y ′ m$ at period t. Then, in order to determine the cluster to which a prediction target user belongs, we calculate the similarity between the feature vector $w ′ m$ and the representative vector cl of each cluster formed in the learning phase by using the Euclidean distance. If $c ^$ is the estimated cluster to which the prediction target user belongs, $c ^$ is obtained by equation (11).

(11)

### 3.4 Finish Date Prediction for a Prediction Target User According to the Belonging Cluster

For each cluster created in the learning phase, a predicted value of the finish date is attached to the cluster. We calculate the average of the finish date for all the student users belonging to the same cluster in the training data set. The average values become the predicted values in the prediction of target user finish dates. Therefore, in the proposed method, the prediction of the finish date of job-hunting for the prediction target users is fixed for a given cluster.

## 4. DATA ANALYSIS

In this section, we show the effectiveness of the proposed method. For the evaluation of the proposed method, we introduce the absolute average error between the predicted value and the actual value as prediction accuracy. Based on this criterion, we conduct a prediction experiment for the finish date of job-hunting based on clustering by using actual data from an Internet portal site. If the prediction accuracy improves, the estimated model can be judged as a better model that is a good representation of the relation between the entry tendency and the finish date.

In this section, we describe two experiments: one using all the entry data and another using a part of the entry data of university student users stored on an Internet portal site. We present the average of 10 iterations in the result tables.

### 4.1 Experiment 1: Using all Entry Data of University Users

In order to evaluate the prediction accuracy of the finish date for the prediction target users for some models (some situations), we compared the prediction accuracies between the proposed method that is based on student user clustering by entry tendencies considering time series and the prediction model that does not consider time series. In this experiment, we use all the entry data stored on an Internet portal site.

#### 4.1.1 Data Set

In this experiment, we used all the data of the student users who graduated from universities in 2013 as the training data set. The training data consist of approximately 6,600,000 entry-data values and 140,000 student users. After learning the model according to the training data, the finish dates are predicted for the prediction target users who graduated from universities in 2014. Thus, we use all the data of the student users who graduated from universities in 2014 as the test data set to evaluate the prediction accuracy. In Japan, university student users who graduated in 2013 and 2014 started job-hunting on December 1 in 2011 and 2012, respectively. The test data (the prediction target data) consist of approximately 4,900,000 entry-data values and 110,000 student users. It is desirable to predict the finish date in advance; therefore, the day for predicting the finish date for the prediction target users is set to be the last day of the fourth month. This setting is based on the assumption that, at the end of March, we will identify a group of student users whose job-hunting duration is likely to be prolonged and will support them after April. Most university students do not complete their job-hunting activities by April. Many Japanese students find a job by August or September; therefore, if, in March, we can determine the student users who are not likely to finish their job-hunting activities by October, several measures to support these student users can be implemented from April, and these measures must be effective.

#### 4.1.2 Experimental Condition

The number of latent classes K is set to 5, the number of clusters C is set to 20, and the number of time periods T is set to 1 (not considering time series) and 3. The details of the time periods are shown in Table 1. The numbers of latent classes K and clusters C are the parameters that an analyst should determine. This is one of statistical model selection problems. In this paper, the numbers of latent classes were set experimentally appropriate values. On the experiments in this section, the number of latent classes was determined from the viewpoint of balance between interpretability and fit to learning data.2) The predictive performance was good and the characteristics of companies and student users were well extracted when the numbers of latent classes are K = 5 and C = 20 respectively.

As a comparison model, we use the prediction model that is based on the aspect model without the time series consideration. In this comparison model,3) the number of latent classes K is set to 20. The belonging cluster (latent class) of the prediction target user is estimated by using the calculation in equation (2). The prediction target user belongs to a latent class for which the student user has the highest belonging probability. The prediction of the finish date of each latent class in the comparison model is calculated by using equation (12).

(12)

In equation (12), F(zk) denotes the estimated finish date of the latent class zk, and U(yj) represents the finish date of student user yj.

#### 4.1.3 Results of Experiment 1

The evaluation criterion is the mean absolute error (MAE) between the predicted finish date and the corresponding correct value in the test data. Table 2 presents the results of this experiment.

The result of the simulation experiment shows that the relation between the entry tendencies and the finish date of job-hunting is constructed successfully by considering the time series.

### 4.2 Experiment 2: Using Entry Data of Student Users from a Specific University Group

Experiment 1 has a problem; the model is built based on all the student users belonging to all the universities. Some studies indicate that the finish date is strongly influenced by student user attributes (such as the university to which they belong) (Hayakawa et al., 2013). Therefore, we should consider that the degree of influence of the entry tendency on the finish date varies for different student user attributes.

From the above discussion, it would be effective to construct the stratified models using only student users having the same attributes. Stratification can enable a clear modeling of the relation between the entry tendencies and the finish date. It is known that the university to which the student user belongs affects the finish date of job-hunting. Then, we apply the result of university clustering proposed by Yamagami et al. (2014). Yamagami et al. (2014) proposed a method of university clustering that uses the similarity based on the estimated cumulative distribution of the finish date of each university. In this experiment, we use one of the twenty clusters created by Yamagami et al. (2014).

#### 4.2.1 Data Set and Experimental Condition

We include student users belonging to 102 universities. We consider the student users who graduated from university in 2013 as the training data. The training data consist of approximately 1,500,000 entry-data values and 27,000 student users. After constructing the model, the finish date is predicted for prediction target users who graduated from university in 2014. The prediction target data consist of approximately 1,200,000 entry-data values and 28,000 student users. In this experiment, the entry history of the prediction target data in the first four months is used to make a prediction for the prediction target users. We consider three values of the periods for entry tendencies: T = 1, T = 3, and T = 7. The other experiment conditions are the same as those in experiment 1 (K = 5, C = 20).

#### 4.2.2 Results of Experiment 2

The evaluation criterion is the same as that in experiment 1, i.e., the mean absolute error between the predicted value and the correct value in the test data set. Table 3 shows the results of this experiment.

A comparison of the results of experiments 1 and 2 shows that the models constructed in experiment 2 have a better representation of the relation between the entry tendencies and the finish date of job-hunting. This result indicates that student user stratification by university leads to the construction of a better model. Further, we observe that a better model is constructed when the time series is considered in detail.

### 4.3 Discussion

In the proposed method, when two student users have similar overall entry tendencies but different time series tendencies of applying to companies, these two student users are grouped to different clusters. The proposed method considers the time series of student user entry tendencies; therefore, it results in the construction of a better prediction model expressing the relation between the entry tendencies and the finish date. Even if student users have similar preferences with respect to particular company groups, the timing of their application to the employment examinations of companies influences the finish date of student users.

## 5. APPLICATION OF PROPOSED METHOD

By the proposed method, student user clusters are constructed by the time-series data of student user entry tendencies. Therefore, student user's preferences for latent classes are clarified after finding his/her cluster. On the other hand, companies with a high degree of belonging to latent classes can identify the characteristics of each latent class. Because the characteristics of each user cluster can be clarified by examining preferences for latent classes, the characteristics of the users in each cluster can be grasped from the viewpoint of companies. Of course, because the belonging probabilities of each student user to latent classes can be calculated, the characteristics of each student user can also grasped.

Here, we perform a characteristic analysis of the clusters formed by the proposed method. This analysis enables us to understand the factors that affect the finish date of job-hunting. Thus, we can understand the relation between the entry tendencies and the finish date by analyzing the probability for the latent class that is expressed by the representative vector of clusters.

For the analysis of the proposed model, we consider the following condition: the number of periods T = 3, the number of latent classes K = 10, and the number of clusters C = 20. On this experiment, the characteristics of companies and student users were well extracted when the numbers of latent classes are K = 10 and C = 20. Through the discussion with professional staff managing for the portal site for job-hunting, we have decided the numbers of latent classes from the viewpoint of interpretability. This is because it is so important to extract the characteristics of companies and student users through latent classes by utilizing the proposed model for analytical purposes in practice. This section consists of two analyses: the analysis of the features of latent classes and the analysis of the probabilities of belonging to the latent classes in order to understand student user preference. In the subsection corresponding to the analysis of probability of belonging to the latent classes in order to understand student user preference, we show the results focusing on a cluster.

### 5.1 Latent Class Feature Analysis

Table 4 shows the features of the latent class of the learned aspect model that is built by the proposed method.

The column of “Feature of company” in Table 4 shows the main industries for the companies that have a high occurrence probability in each latent class. The third column shows the estimated probabilities of latent classes, which mean the size of each latent class. We observe that companies are divided into the latent classes by industry type and there is no extremely small sized latent class.

### 5.2 Analysis of Belonging Probability for the Latent Class to Understand Student User Preference

Table 5 shows the time shift of the entry tendency for a unique cluster. Here, we focus on an interesting cluster that has the latest finish date of job-hunting. It is predicted that the finish date for the student users belonging to this cluster is the latest among 20 created clusters. In this table, we can observe the latent classes having a high belonging probability. By analyzing Tables 4 and 5, we can determine the reason for the latest finish date from the viewpoint of entry tendency.

Table 5 shows that the student users in this cluster that is being analyzed prefer the latent class 6 because it has the highest probability. The latent class 6, which has the highest probability, has a high occurrence probability of companies such as retail stores. In summary, the student users belonging to a cluster with the latest predicted finish date of job-hunting have a high preference for companies such as retail stores.

Next, Figure 3 shows the time-series variation of belonging probability for each latent class.

The preference for the latent class 6 increases until March. The changes of belonging probabilities on the time axis are used to measure the changes of users’ preferences. It can be presumed that the student users focus on one type of industry from an early stage and do not change their entry tendencies during dogged job-hunting. This behavior is one of the reasons why they are not likely to finish their job-hunting early, thus prolonging the job-hunting activities. In addition, in this situation, it is difficult to solve the mismatch between a student user and companies in the same category.

We can control the mismatch between a student user and companies by allowing their entry tendency to change appropriately. The match between a student user and a company can be improved if student users evaluate their fit with the company and broaden their preference to include other companies.

We can apply the proposed model to analyze the relation between entry tendencies and the finish date by considering the time series, as mentioned above.

## 6. DISCUSSION

The numbers of latent classes K and clusters C are the parameters which should be determined. This is one of statistical model selection problems. Usually, model selection criteria, such as AIC, BIC, and MDL, can be applied to model selection problems with linear statistical models. However, formulas of these criteria were derived under the assumption that the maximum likelihood parameter estimator asymptotically follows a normal distribution (the central limit theorem). The likelihood function of a latent class model is not unimodal and the maximum likelihood estimator does not satisfy the central limit theorem. Therefore, the application of model selection criteria, such as AIC, BIC, and MDL, is not reasonable in a strict sense although these criteria can be just applied. In fact, even if a model selection criterion is applied, the practically appropriate number of latent classes is frequently not chosen because the number of parameters gets drastically large when the number of latent classes increases.

In addition, this paper does not concretely show an effective action plan for a group of student users whose job-hunting duration is likely to be prolonged. However, if the target student segment is identified and the entry tendencies can be analyzed, it is possible for the analysts to consider the actual action plan. It is important to make clear a target student group and it is possible by applying the proposed model.

In this paper, the numbers of latent classes were set experimentally appropriate values. It is so important to extract the characteristics of companies and student users through latent classes. It will, however, be a future work to discuss a reasonable method to determine the optimal numbers of latent classes and clusters.

## 7. CONCLUSION AND FUTURE WORK

In this study, we quantify the entry tendencies of a student user by considering the time shift and using an estimated aspect model to predict the finish date of job-hunting for the student user. Then, we propose a cluster-ing method that considers the time series of the entry tendencies and design a model that represents the rela-tive relation between the entry tendencies and the finish date. In the simulation experiment, we use actual data from an Internet portal site to demonstrate the effectiveness of the proposed method that considers the time series of the entry tendency of student users. Though our proposal is based on a hypothesis of a statistical relationship between the time-series variation of users’ preference and the finish date of job-hunting, the effectiveness of the proposed model shows that the assumption is right.

In addition, we focused on the analysis of characteristics of student groups. We proposed the method to extract student groups who have difficulty on job-hunting activities, but we did not discuss what kind of support should be given to those students. In future work, we must consider the concrete support method for student users in further detail and evaluate the impact of the change in the number of clusters. In addition, we can study a recommendation system based on the proposed method as a suitable application.

## ACKNOWLEDGEMENT

The authors would like to express their gratitude to Mr. Gendo Kumoi, Dr. Haruka Yamashita, and all the members of Goto Laboratory, Waseda University, for their helpful comments in this research. A part of this study was supported by JSPS KAKENHI Grant Numbers 26282090 and 26560167.

## Figure

Graphical representation of the aspect model.

Method to calculate entry tendencies.

Time-series variation of each latent class.

## Table

Schedule of job-hunting in Japan

Divided periods in experiment

Results of experiment 1 (MAE)

Results of experiment 2 (MAE)

Feature of company (K = 10)

Entry tendencies of each cluster predicted to have the latest finish date

## REFERENCES

1. Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer.
2. Collins, L. M. and Lanza S. T. (2013), Latent Class Latent class and latent transition analysis: With Applications in the Social, Behavioral, and Health Sciences, John Wiley & Sons, Hoboken, New Jersey.
3. Dillon, W. R. and Mulani, N. (1984), A probabilistic latent class model for assessing inter-judge reliability, Multivariate Behavioral Research, 19(4), 438-458
4. Dempster, A. P. , Laird, N. M. , and Rubin, D. B. (1977), Maximum likelihood from incomplete data via the EM algorithm, J. Royal Statistical Society Series B, 39(1), 1-22.
5. Fujiwara, N. , Mikawa, K. , and Goto, M. (2014), A new estimation method of latent class model with high accuracy by using both browsing and purchase histories, Proceedings of the 15th Asia Pacific Industrial Engineering and Management Systems Conference (APIEMS 2014), Jeju, Korea.
6. Fujiwara, N. , Mikawa, K. , and Goto, M. (2017), A proposal of aspect model expressing both browsing and purchasing behaviors for customer purchase prediction, Journal of the Japan Society for Management Information, 26(1), 1-16 (in Japanese).
7. Gibson, W. A. (1959), Tree multivariate models: Factor analysis, latent structure analysis, and latent profile analysis, Psychometrika, 24(3), 229-252.
8. Goodman, L. A. (1974), Exploratory latent structure analysis using both identifiable and unidentifiable models, Biometrika, 61(2), 215-231.
9. Goto, M. , Minetoma, K. , Mikawa, K. , Kobayashi, M. , and Hirasawa, S. (2014), A modified aspect model for simulation analysis, IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA, USA, 1306-1311.
10. Goto, M. , Mikawa, K. , Hirasawa, S. , Kobayashi, M. , Suko, T. , and Horii, S. (2015), A new latent class model for analysis of purchasing and browsing histories on EC sites, Industrial Engineering & Management Science, 14(4), 335-346.
11. Greene, W. H. and Hensher, D. A. (2003), A latent class model for discrete choice analysis: Contrasts with mixed logit, Transportation Research Part B: Methodological, 37(8), 681-698.
12. Hayakawa, M. , Mikawa, K. , Ishida, T. , and Goto, M. (2013), A statistical prediction model of students’ success on job hunting by log data, The 14th Asia Pacific Industrial Engineering and Management Systems Conference, Cebu, Philippines.
13. Hagenaars, J. A. and McCutcheon, A. M. (2009), Applied Latent Class Analysis, Cambridge University Press, New York, NY.
14. Hofmann, T. (1999a) , Probabilistic latent semantic analysis, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 289-296.
15. Hofmann, T. (1999b) , Probabilistic latent semantic indexing, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, USA, 50-57.
16. Hofmann, T. and Puzicha, J. (1999), Latent class models for collaborative filtering, Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 688-693.
17. Hofmann, T. (2001), Unsupervised learning by probabilistic latent semantic analysis, Machine Learning Journal, 42(1-2), 177-196.
18. Hofmann, T. (2003), Collaborative filtering via gaussian probabilistic latent semantic analysis, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, 259-266.
19. Hofmann, T. (2004), Latent class models for collaborative filtering, ACM Trans. Information Systems, 22(1), 89-115.
20. Jin, R. , Si, L. , and Zhai, C. (2006), A study of mixture models for collaborative filtering, Information Retrieval, 9(5), 357-382.
21. Langseth, H. and Nielsen, T. D. (2012), A latent model for collaborative filtering, International Journal of Approximate Reasoning, 53, 447-466.
22. Lazarsfeld, P. F. and Henry, N. W. (1968), Latent Structure Analysis, Boston: Houghton Mifflin.
23. Madden, T. J. and Dillon, W. R. (1982), Causal analysis and latent class models: An application to a communication hierarchy of effects model, Journal of Marketing Research, 19(4), 472-490.
24. Magidson, J. and Vermunt, J. K. (2002), Latent class models for clustering: A comparison with k-means, Canadian Journal of Marketing Research, 20, 37-44.
25. Matsuzaki, Y. , Yamagami, K. , Mikawa, K. , and Goto, M. (2015), Analysis of customer purchase behavior by using purchase history with discount coupon based on latent class model, Proceedings of the 16th Asia Pacific Industrial Engineering and Management Systems Conference, Ho Chi Minh City, Vietnam.
26. McLachlan, G. J. and Krishnan, T. (1997), The EM Algorithm and Extensions (2nd Ed.), Wiley, New York.
27. Si, L. and Jin, R. (2003), Flexible mixture model for collaborative filtering, Proceedings of the 20th International Conference on Machine Learning, 2, 704-711.
28. Suzuki, T. , Kumoi, G. , Mikawa, K. , and Goto, M. (2014), A design of recommendation based on flexible mixture model considering purchasing interest and postpurchase satisfaction, Journal of Japan Industrial Management Association, 64(4E), 570- 578.
29. Swait, J. (1994), A structural equation model of latent segmentation and product choice for cross-sectional revealed preference choice data, Journal of Retail and Consumer Services, 1(2), 77-89.
30. Train, K. E. (2009), Discrete Choice Methods with Simulation (2nd Ed.), Cambridge University Press, New York, NY.
31. Yamagami, K. , Mikawa, K. , Goto, M. , and Yatabe, H. (2014), Proposal of clustering method that focuses on the mixing ratio of the latent class model, Proceedings Japan Industrial Management Association 2014 Fall Meeting, Hiroshima, Japan, 132-133 (in Japanese).
32. Yamagami, K. , Mikawa, K. , Goto, M. , and Ogihara, T. (2015), A statistical prediction model of students’ finishing date on job hunting using internet portal sites data, Proceedings of the 16th Asia Pacific Industrial Engineering and Management Systems Conference, Ho Chi Minh City, Vietnam,.
33. Yang, T. , Itagaki, N. , Mikawa, K. , and Goto, M. (2015), Collaborative filtering based on PLSA model introduced a beta distribution, Proceedings Japan Industrial Management Association 2015 Fall Meeting, Kanazawa, Japan, 140-141 (in Japanese).
34. Zhang, N. L. (2004), Hierarchical latent class models for cluster analysis, Journal of Machine Learning Research, 5(6), 697-723.