1. INTRODUCTION
In recent years, most university students (users) in Japan have been using Internet portal sites for their job hunting activities. Various types of information related to jobhunting are available on Internet portal sites, and student users can search this information for interesting jobs and companies. The use of an Internet portal site enables student users to easily apply to employment examinations of several companies. However, sometimes, jobhunting activities are prolonged owing to a mismatch between a student user and the company requirements. In order to solve this problem, it is desirable to use the activity data stored on an Internet portal site to find a group of student users who may not be able to finish jobhunting early. Lengthening of jobhunting activities is one of the social problems in Japan, so its countermeasure is an important topic. By utilizing the largescale data accumulated in an Internet portal site for jobhunting, it can be expected to present a solution to this problem.
In general, student users have a wide variety of different preferences, so the latent class model which is also called the mixed model (Bishop, 2006;Train, 2009) is considered to be effective to model their behaviors. Hayakawa et al. (2013) proposed a predictive model of the jobhunting finish date of student by using student demographic information. Hayakawa et al. (2013) observed that the jobhunting finish date of students strongly depends on their demographic information; they constructed the model based on a stratification tree and mixed Weibull distributions. Yamagami et al. (2014) proposed the probabilistic latent class model that can describe the relationship among demographic information, action log data, and the jobhunting finish date of students; the authors showed the effectiveness of their proposal from the viewpoint of prediction accuracy. However, many students tend to change their choice of appropriate companies and jobs during their jobhunting activities. According to the observations of several specialists working for a company operating an Internet portal site, it is important to model the change in student user behavior related to entry tendencies. Student user behavior related to entry tendencies exhibits various patterns. For example, some student users continue to apply to employment examinations of companies in the same industry category, whereas other student users change their entry tendency with the passage of time by reevaluating their aptitude during their jobhunting activities. Such a timeseries variation of entry tendencies appears to affect the expected finish date of jobhunting. That is, there is a hypothesis that there might be a statistical relationship between the timeseries variation of users’ preference and the finish date of jobhunting.1) If this hypothesis is correct, a model expressing this relationship can be constructed from the data of the entry history and the finish date of jobhunting. Therefore, even in the absence of a strong statistical relation between the entry activity and the finish date of student users, it would be significant to build an analytic model representing the moderate relation and create an effective action plan for a group of student users whose jobhunting duration is likely to be prolonged.
In this study, we attempt to build an appropriate model expressing the relation between the entry tendency and the finish date from a global point of view. Specifically, we propose the method of student user clustering based on a latent class model representing the timeseries variation of entry tendencies. The proposed model enables us to analyze the entry patterns from the viewpoint of timeseries variation of jobhunting activities. By applying the proposed method, we can analyze the relation between the time shift of entry tendencies and the finish date of jobhunting. Additionally, it is possible to predict the finish date of jobhunting for each cluster. By using the proposed method, it becomes possible to find the user group that is predicted to be late for jobhunting at an early stage. In order to verify the effectiveness of the proposed method, we demonstrate the data analysis and processing by using actual data from an Internet portal site. Further, we show that it is possible to analyze the characteristics of student users who are attached to a formed cluster.
2. PRELIMINARIES
2.1 JobHunting by University Students in Japan
In this section, the system of jobhunting in Japan is introduced as the basis of our study. The style of job hunting activities in Japan is unique in the world. Most Japanese students start jobhunting activities during their student life and begin working immediately after their graduation from educational institutions such as universities or graduate schools. Therefore, students have to conduct jobhunting activities while studying in their universities. The schedule for jobhunting is decided by the Federation of Economic Organizations; therefore, the job hunting information for almost all companies is published at the same time.
In Table 1, we present the jobhunting schedule of Japanese students who graduated in March 2015. In addition, the schedule of students who will graduate in March 2016 is shown. As described earlier, the basic schedule of jobhunting is determined by the Federation of Economic Organizations.
For example, students who were going to graduate from university before 2015 could obtain information about companies beginning in the month of December, two years before their graduation. Thus, the students who graduated in March 2014 could obtain the employment information of a company from December 2012, and the students who graduated in March 2015 could obtain the information from December 2013. Most of the students started their job hunting activities at that time. The Federation of Economic Organizations in Japan has postponed the schedule of jobhunting for the students graduating in 2016. These students can obtain the employment information for their job hunting activities from the month of March, a year before their graduation. At the same time as the publication of information about jobhunting, each Internet portal site begins providing services related to the employment information of companies. At the start of their job search, students participate in briefing sessions held by companies and visit their seniors working for the companies in which they are interested. Then, they determine the company that they wish to join. Next, they send entry sheets (applications for employment tests) to companies to express the reason for their interest. If the company approves the entry sheet, an interview (an oral examination) for the job is held. The start date for the interviews by Japanese companies is also officially decided. Students who graduated before 2015 could attend an interview for a job from the month of April, a year before their graduation. However, students who will graduate in 2016 can attend an interview for a job from the month of August, a year before their graduation. After successfully passing the interview for a job, students can receive an unofficial job offer, and their jobhunting activity is complete. Many companies make the official employment offer on October 1. In addition to the above activities related to jobhunting, many companies offer internships, and many students apply for internships as a part of jobhunting activities. Almost all companies select students according to this schedule; however, a few companies do not follow it.
In Japan, most students generally use Internet portal sites for jobhunting. Numerous Internet portal sites for jobhunting exist, and many student users use multiple sites simultaneously. Typically, an Internet portal site for jobhunting provides a comprehensive service from the start to the finish of jobhunting activities; the stages in the service include the entry for an internship, the reservation of a briefing session, the entry for the employment examination of a company, and the selfanalysis and research for various industries.
The difficulty of predicting finish date of that is caused by the existence of various other factors that can affect the finish date of each student user, e.g., meetings with senior company employees who have graduated from the same university, attendance in information sessions held by companies, company search activities on the Internet, and the daily lifestyle of the student users, including the supporting activities at their university. In addition, many student users do not use only a specific Internet portal site; they use multiple portal sites simultaneously. Internet portal sites for jobhunting cannot obtain all such information related to student user activities. Therefore, it may not be possible to accurately predict the finish dates of students’ jobhunting activities with only the action history on an internet portal site for jobhunting. However, it is natural to assume that there exists a relation between the time shift of entrypattern tendency and the finish date of student user jobhunting activity. If this hypothesis is correct, a model expressing this relationship can be constructed from the data of the entry history and the finish date of jobhunting. Therefore, even in the absence of a strong statistical relation between the entry activity and the finish date of student users, it would be significant to build an analytic model representing the moderate relation and create an effective action plan for a group of student users whose jobhunting duration is likely to be prolonged.
2.2 The Latent Class Models
In this study, we quantify and analyze entry histories of student users by using a latent class model (Gibson, 1959;Lazarsfeld and Henry, 1968;Goodman, 1974;Hofmann and Puzicha, 1999;Hofmann, 2001;Magidson and Vermunt, 2002;Hofmann, 2004;Hagenaars and McCutcheon, 2009;Collins and Lanza, 2013). Various latent class models have been proposed. In this section, we discuss the significance of using a latent class model and introduce some variation.
2.2.1 The Effectiveness of Latent Class Models
Latent class models assume the existence of unobservable latent variables behind the observable training data. For example, when the latent class model is applied to purchase history data, it is possible to express differences of purchase probabilities for each item between users according to unobservable latent factors that express user preference for items (Matsuzaki et al., 2015;Fujiwara et al., 2017). Further, when the latent class model is applied to document data, it is possible to observe that each document and each word arise from unobservable latent topics that documents potentially retain (Hofmann, 1999b).
2.2.2 Previous Studies of Latent Class Models
In recent times, various types of latent class models have been proposed and widely used (Greene and Hensher, 2003;Hofmann, 1999a;Si and Jin, 2003;Jin et al., 2006;Zhang, 2004). For example, Zhang (2004) proposed the hierarchical latent class models. Dillon and Mulani (1984) presented a probabilistic latent class model for assessing interjudge reliability. In the field of marketing science, latent classes are convenient and effective for representing customer segments and many applied models have been studied (Madden and Dillon, 1982;Swait, 1994;Train, 2009). Langseth and Nielsen proposed a latent model based on a linear Gaussian Bayesian network (Langseth and Nielsen, 2012).
Especially in the field of the recommender system, the various types of latent class models, for example, Gaussian PLSA (Hofmann, 2003), FMM (Si and Jin, 2003;Suzuki et al., 2014), Joint Mixture Model (JMM) and Decoupled Model (DM) (Jin et al., 2006), have been proposed. However, many of them are focusing on the prediction problem of the user’s evaluation value for each item, which is called rating. In this paper, we focus on the problem to represent the relation between finish date of jobhunting and timeseries variation of entry tendencies. We should construct a model to represent timeseries variation of entry tendencies and a most reasonable model is the aspect model. Many other models cannot be applied to represent the timeseries variation of entry tendencies because these models focus on rating prediction.
The aspect model is one of the wellknown latent class models (Hofmann and Puzicha, 1999;Goto et al., 2014, 2015) and can be used as a clustering method or a dimensionreduction technique. This model, which originated in the field of information retrieval, focuses on the cooccurrence of the document and the word. It is assumed that the document and the word are occurring stochastically from the latent class. The aspect model is widely used in various fields and is extended in accordance with the purpose. For example, with the progress of web services, the aspect model has been applied to recommendation systems for many electronic commerce (EC) sites. This system recommends items that match user preferences. In the modelbased approach for the recommender system, some models utilizing the user’s rating, have been developed as variations of the aspect model. When this model is applied to purchase data with user rating for items, the evaluation values of the users can be predicted for each item. In this model, the cooccurrence of three variables—the user, the items, and the ratings—is the point of focus, and multinomial distributions are assumed for a probabilistic model of ratings. Based on the assumption that a rating follows a normal distribution, this model is extended to develop a model called Gaussian Probabilistic Latent Semantic Analysis (gPLSA) (Hofmann, 2003). In the gPLSA model, various normal distributions of the evaluation point occur from the individual latent class. The gPLSA model differs from the above model with respect to the number of parameters estimated. Owing to the assumption of normal distribution, the number of estimated parameters decreases. Therefore, gPLSA is easy to use with respect to parameter estimation and result analysis. In this field, an accurate evaluation value predicted using these models has become an important topic.
Various other types of latent class models exist. The flexible mixture model (FMM) considers different latent classes for users and items (Si and Jin, 2003;Suzuki et al., 2014). Another latent class model considers browsing history and other factors (Fujiwara et al., 2014;Goto et al., 2015). Based on the application and purpose, analysts use the appropriate model. As described above, either of the following two approaches is actively taken: 1) increasing the number of variables to be handled in order to enhance the expression ability of the model (Yamagami et al., 2015); 2) applying suitable distributions to each variable by considering the data characteristic (Yang et al., 2015). These approaches are typical extensions of the latent class model.
In this study, we introduce the aspect model to analyze the relation between the entry tendency of student users and the finish date of jobhunting. We propose a method that quantifies the timeseries variation of entry tendencies by utilizing the aspect model. In the next subsection, we define the aspect model in detail.
2.2.3 Aspect Model
In this study, we apply the aspect model (Hofmann, 1999a, 1999b) in the finish date prediction in order to quantify the entry tendencies of student users. The aspect model is a statistical model that assumes the existence of a discrete latent class between the student users and the companies. The users and the companies are divided into clusters (latent classes) stochastically. Here, the event that the student user y_{j} applies to the company x_{i} is denoted by (y_{j}, x_{i}). A set of companies is defined as $\text{X}=\{{x}_{i}:1\le i\le I\},$ a set of student users is defined as $\text{Y}=\{{y}_{j}:1\le j\le J\},$ and a set of latent classes is defined as $\text{Z}=\{{z}_{k}:1\le k\le K\}.$ In this case, the aspect model can be stochastically represented by equation (1).
Here P(x_{i}, y_{j}) is the joint probability of the company x_{i} and the student user ${y}_{j},\hspace{0.17em}\hspace{0.17em}P({z}_{k})\hspace{0.17em}{y}_{j}$, is the probability of the latent class ${z}_{k},\hspace{0.17em}\hspace{0.17em}P({x}_{i}{z}_{k})\hspace{0.17em}{z}_{k}$, is the conditional probability of the company x_{i} given the latent class z_{k}, and $P({y}_{j}{z}_{k})$ is the conditional probability of the student user y_{j} given the latent class z_{k}. In equation (1), the parameters P(z_{k}), $P({x}_{i}{z}_{k})$, and $P({y}_{j}{z}_{k})$ are estimated by the expectation– maximization (EM) algorithm (Dempster et al., 1977).
2.2.4 Use of Latent Class Model for JobHunting Data
In order to apply the aspect model to the jobhunting activity of student users, we can estimate their entry probability for each company.
In particular, the application of a latent class model to student user entries appears to be effective because, typically, we can assume that different types of student users coexist. A latent class represents a group of student users with similar characteristics and cannot be considered as the data. Additionally, the use of latent class models has various advantages. Latent class models facilitate the analysis of stored data on an Internet portal site owing to the dimension reduction of sparse highdimensional data by the assumption of a latent class. Assuming latent classes is equivalent to clustering student users and companies in the number of latent classes simultaneously. This model enables us to clarify the relation between student users and companies from the viewpoint of entry activities. In this study, we attempt to utilize the aspect model to quantify the time shift of entry tendencies.
This study differs from extensions such as the models shown in section 2.2.2. We show that it is possible to perform a more effective quantification by using the estimated parameters of the aspect model. The ideas in this study can be utilized not only for the aspect model but also for various latent class models. This feature is one of the advantages of this study with respect to scalability. Figure 1
3. PROPOSED METHOD
In order to find groups of student users whose job hunting duration is likely to be prolonged and to analyze the statistical characteristics of each group, we construct a model that expresses the relation between the timeshift pattern of entry tendencies and the finish dates of student users by considering the time series. Simultaneously, we can analyze the pattern of entry tendencies by utilizing this model. Here, owing to the large number of companies, it is necessary to aggregate the entry data of applied companies based on student users and to treat similar companies collectively from the statistical viewpoint. Therefore, we attempt to quantify the time shift of entry tendencies by using the aspect model, which is one of the effective latent class models. Then, student user clustering can be performed by considering the time shift of entry tendencies. We can use the belonging probabilities for the latent classes to quantify the entry tendencies and to easily calculate them by performing timeseries division.
The proposed method consists of the quantification of the entry tendency of student users by using the aspect model, student user clustering by using the kmeans method, estimation of the prediction target user cluster based on the similarity calculation, and prediction of the finish date of each cluster. The proposed clustering method is performed as follows:
[Steps in proposed method]

Step 1. Learning the aspect model by using a set of training data.

Step 2. Quantifying the entry tendencies by considering the time series of student users in the training data based on the learned aspect model.

Step 3. Clustering the student users in the training data by applying the kmeans method to the timeseries data of student user entry tendencies.

Step 4. Quantifying the entry tendencies by considering the time series for the prediction target users based on the learned aspect model.

Step 5. Estimating the best cluster for the prediction target users based on the similarities between each cluster and quantified entry tendencies of prediction target users.

Step 6. Predicting the finish date for the prediction target users by using the average values in each cluster.
Here, the set of student users in the training data plays a role in the construction of the aspect model (the estimation of aspect model parameters). In addition, these student users also play a role in the formation of clusters used to predict the finish dates of the prediction target users. By using the training data as described above, we can estimate the cluster for each prediction target user.
3.1 Quantification of Entry Tendencies by the Aspect Model
During jobhunting, each student user applies to the employment examinations of companies that he/she prefers. Then, it is assumed that the student user preferences for companies can be represented by belonging probabilities for the latent classes. In order to consider the time series of the entry tendencies, we calculate the entry tendencies divided into arbitrary T periods, allowing duplication. Letting ${{P}^{\prime}}_{t}({z}_{k}{y}_{j})$ be the belonging probability of the student user y_{j} to the latent class z_{k} at the time t, the entry tendencies of student users are defined by equation (2). This equation represents the entry tendency of student user y_{j} at period t.
Here, t (t = 1, 2, …, T) denotes the number of time periods, and N_{jt} represents the total number of entries made by student user y_{j} at period t. η_{ijt} is the indicator function; it takes 1 if student user y_{j} enters company x_{i} at period t, otherwise it takes 0. The information of the company x_{i}, the student user y_{j}, and the period t are included in the variable η_{ijt}. Further, $\frac{\widehat{P}({z}_{k}{x}_{i})}{}$, which represents the belonging probability for the latent class z_{k} for each company, is estimated by using equation (4).
Here, $\widehat{P}({z}_{k})$ and $\widehat{P}({x}_{i}{z}_{k})$ means the estimated probabilities of P(z_{k}) and $P({x}_{i}{z}_{k})$ respectively. The parameters of equation (4), i.e., $\widehat{P}({z}_{k})$ and $\widehat{P}({x}_{i}{z}_{k})$, are estimated by the EM algorithm (Dempster et al., 1977;McLachlan and Krishnan, 1997). The student users have stochastic preferences for each latent class, and the sum of all the preferences of each student user for each latent class is equal to 1. This probability represents the entry tendency for the student user. We focus on the changes of belonging probabilities to latent classes with time t. Therefore, it can be used at an arbitrary divided period for calculation of the entry tendency. In this model, the entry tendencies are calculated by using a hierarchical structure in order to consider the time series effectively. If the belonging probabilities to latent classes are drastically changed by t, it means that his/her preferences to companies are also changed. The method to calculate the entry tendencies is shown in Figure 2.
3.2 Learning Student User Clustering by KMeans Method
In order to create clusters for the analysis of the relation between the entry tendency of student users and the finish date of jobhunting, we perform clustering for all the student users in the training data set. For clustering, the elements of each student user consist of entry tendencies that are calculated by equation (2) for T periods in order to consider the time series. A feature value for a student user w_{j} in the training data set is expressed by equation (5).
where s_{jt} is a Kdimensional vector representing the entry tendency of student user y_{j} at period t. s_{jt} is represented by equation (6).
In this study, we applied the kmeans method as a clustering method. The kmeans method is one of the basic and effective clustering tools. The number of clusters is denoted by C; when l is defined as a cluster number variable, the representative vector c_{l}(l = 1, 2, …, C) for each cluster is obtained by equation (7).
where D_{l} represents the number of student users belonging to cluster l.
3.3 Prediction of Target Users Clustering by Similarity Calculation
We also calculate the entry tendency of each period for the prediction target users by using equation (2). When we use equation (2) to calculate entry tendencies for the prediction target users, the belonging probabilities for the latent class z_{k} of company x_{i}, ${x}_{i},\hspace{0.17em}\widehat{P}({z}_{k}{x}_{i}){x}_{i}$, can be calculated by directly using the same probabilities that were estimated in the learning phase. The reason for the usage of this method is that the list of companies to which student users can apply does not change every year on an Internet portal site. Here, the set of prediction target users is defined as $Y\prime =\{{y}^{\prime}{}_{m}:1\le m\le M\}$ , and the feature vector ${{w}^{\prime}}_{m}$ is calculated for each prediction target user by equation (9).
As the same, ${{s}^{\prime}}_{mt}$ is a Kdimensional vector representing the entry tendency of student user ${{y}^{\prime}}_{m}$ at period t. Then, in order to determine the cluster to which a prediction target user belongs, we calculate the similarity between the feature vector ${{w}^{\prime}}_{m}$ and the representative vector c_{l} of each cluster formed in the learning phase by using the Euclidean distance. If $\widehat{c}$ is the estimated cluster to which the prediction target user belongs, $\widehat{c}$ is obtained by equation (11).
3.4 Finish Date Prediction for a Prediction Target User According to the Belonging Cluster
For each cluster created in the learning phase, a predicted value of the finish date is attached to the cluster. We calculate the average of the finish date for all the student users belonging to the same cluster in the training data set. The average values become the predicted values in the prediction of target user finish dates. Therefore, in the proposed method, the prediction of the finish date of jobhunting for the prediction target users is fixed for a given cluster.
4. DATA ANALYSIS
In this section, we show the effectiveness of the proposed method. For the evaluation of the proposed method, we introduce the absolute average error between the predicted value and the actual value as prediction accuracy. Based on this criterion, we conduct a prediction experiment for the finish date of jobhunting based on clustering by using actual data from an Internet portal site. If the prediction accuracy improves, the estimated model can be judged as a better model that is a good representation of the relation between the entry tendency and the finish date.
In this section, we describe two experiments: one using all the entry data and another using a part of the entry data of university student users stored on an Internet portal site. We present the average of 10 iterations in the result tables.
4.1 Experiment 1: Using all Entry Data of University Users
In order to evaluate the prediction accuracy of the finish date for the prediction target users for some models (some situations), we compared the prediction accuracies between the proposed method that is based on student user clustering by entry tendencies considering time series and the prediction model that does not consider time series. In this experiment, we use all the entry data stored on an Internet portal site.
4.1.1 Data Set
In this experiment, we used all the data of the student users who graduated from universities in 2013 as the training data set. The training data consist of approximately 6,600,000 entrydata values and 140,000 student users. After learning the model according to the training data, the finish dates are predicted for the prediction target users who graduated from universities in 2014. Thus, we use all the data of the student users who graduated from universities in 2014 as the test data set to evaluate the prediction accuracy. In Japan, university student users who graduated in 2013 and 2014 started jobhunting on December 1 in 2011 and 2012, respectively. The test data (the prediction target data) consist of approximately 4,900,000 entrydata values and 110,000 student users. It is desirable to predict the finish date in advance; therefore, the day for predicting the finish date for the prediction target users is set to be the last day of the fourth month. This setting is based on the assumption that, at the end of March, we will identify a group of student users whose jobhunting duration is likely to be prolonged and will support them after April. Most university students do not complete their jobhunting activities by April. Many Japanese students find a job by August or September; therefore, if, in March, we can determine the student users who are not likely to finish their jobhunting activities by October, several measures to support these student users can be implemented from April, and these measures must be effective.
4.1.2 Experimental Condition
The number of latent classes K is set to 5, the number of clusters C is set to 20, and the number of time periods T is set to 1 (not considering time series) and 3. The details of the time periods are shown in Table 1. The numbers of latent classes K and clusters C are the parameters that an analyst should determine. This is one of statistical model selection problems. In this paper, the numbers of latent classes were set experimentally appropriate values. On the experiments in this section, the number of latent classes was determined from the viewpoint of balance between interpretability and fit to learning data.2) The predictive performance was good and the characteristics of companies and student users were well extracted when the numbers of latent classes are K = 5 and C = 20 respectively.
As a comparison model, we use the prediction model that is based on the aspect model without the time series consideration. In this comparison model,3) the number of latent classes K is set to 20. The belonging cluster (latent class) of the prediction target user is estimated by using the calculation in equation (2). The prediction target user belongs to a latent class for which the student user has the highest belonging probability. The prediction of the finish date of each latent class in the comparison model is calculated by using equation (12).
In equation (12), F(z_{k}) denotes the estimated finish date of the latent class z_{k}, and U(y_{j}) represents the finish date of student user y_{j}.
4.1.3 Results of Experiment 1
The evaluation criterion is the mean absolute error (MAE) between the predicted finish date and the corresponding correct value in the test data. Table 2 presents the results of this experiment.
The result of the simulation experiment shows that the relation between the entry tendencies and the finish date of jobhunting is constructed successfully by considering the time series.
4.2 Experiment 2: Using Entry Data of Student Users from a Specific University Group
Experiment 1 has a problem; the model is built based on all the student users belonging to all the universities. Some studies indicate that the finish date is strongly influenced by student user attributes (such as the university to which they belong) (Hayakawa et al., 2013). Therefore, we should consider that the degree of influence of the entry tendency on the finish date varies for different student user attributes.
From the above discussion, it would be effective to construct the stratified models using only student users having the same attributes. Stratification can enable a clear modeling of the relation between the entry tendencies and the finish date. It is known that the university to which the student user belongs affects the finish date of jobhunting. Then, we apply the result of university clustering proposed by Yamagami et al. (2014). Yamagami et al. (2014) proposed a method of university clustering that uses the similarity based on the estimated cumulative distribution of the finish date of each university. In this experiment, we use one of the twenty clusters created by Yamagami et al. (2014).
4.2.1 Data Set and Experimental Condition
We include student users belonging to 102 universities. We consider the student users who graduated from university in 2013 as the training data. The training data consist of approximately 1,500,000 entrydata values and 27,000 student users. After constructing the model, the finish date is predicted for prediction target users who graduated from university in 2014. The prediction target data consist of approximately 1,200,000 entrydata values and 28,000 student users. In this experiment, the entry history of the prediction target data in the first four months is used to make a prediction for the prediction target users. We consider three values of the periods for entry tendencies: T = 1, T = 3, and T = 7. The other experiment conditions are the same as those in experiment 1 (K = 5, C = 20).
4.2.2 Results of Experiment 2
The evaluation criterion is the same as that in experiment 1, i.e., the mean absolute error between the predicted value and the correct value in the test data set. Table 3 shows the results of this experiment.
A comparison of the results of experiments 1 and 2 shows that the models constructed in experiment 2 have a better representation of the relation between the entry tendencies and the finish date of jobhunting. This result indicates that student user stratification by university leads to the construction of a better model. Further, we observe that a better model is constructed when the time series is considered in detail.
4.3 Discussion
In the proposed method, when two student users have similar overall entry tendencies but different time series tendencies of applying to companies, these two student users are grouped to different clusters. The proposed method considers the time series of student user entry tendencies; therefore, it results in the construction of a better prediction model expressing the relation between the entry tendencies and the finish date. Even if student users have similar preferences with respect to particular company groups, the timing of their application to the employment examinations of companies influences the finish date of student users.
5. APPLICATION OF PROPOSED METHOD
By the proposed method, student user clusters are constructed by the timeseries data of student user entry tendencies. Therefore, student user's preferences for latent classes are clarified after finding his/her cluster. On the other hand, companies with a high degree of belonging to latent classes can identify the characteristics of each latent class. Because the characteristics of each user cluster can be clarified by examining preferences for latent classes, the characteristics of the users in each cluster can be grasped from the viewpoint of companies. Of course, because the belonging probabilities of each student user to latent classes can be calculated, the characteristics of each student user can also grasped.
Here, we perform a characteristic analysis of the clusters formed by the proposed method. This analysis enables us to understand the factors that affect the finish date of jobhunting. Thus, we can understand the relation between the entry tendencies and the finish date by analyzing the probability for the latent class that is expressed by the representative vector of clusters.
For the analysis of the proposed model, we consider the following condition: the number of periods T = 3, the number of latent classes K = 10, and the number of clusters C = 20. On this experiment, the characteristics of companies and student users were well extracted when the numbers of latent classes are K = 10 and C = 20. Through the discussion with professional staff managing for the portal site for jobhunting, we have decided the numbers of latent classes from the viewpoint of interpretability. This is because it is so important to extract the characteristics of companies and student users through latent classes by utilizing the proposed model for analytical purposes in practice. This section consists of two analyses: the analysis of the features of latent classes and the analysis of the probabilities of belonging to the latent classes in order to understand student user preference. In the subsection corresponding to the analysis of probability of belonging to the latent classes in order to understand student user preference, we show the results focusing on a cluster.
5.1 Latent Class Feature Analysis
Table 4 shows the features of the latent class of the learned aspect model that is built by the proposed method.
The column of “Feature of company” in Table 4 shows the main industries for the companies that have a high occurrence probability in each latent class. The third column shows the estimated probabilities of latent classes, which mean the size of each latent class. We observe that companies are divided into the latent classes by industry type and there is no extremely small sized latent class.
5.2 Analysis of Belonging Probability for the Latent Class to Understand Student User Preference
Table 5 shows the time shift of the entry tendency for a unique cluster. Here, we focus on an interesting cluster that has the latest finish date of jobhunting. It is predicted that the finish date for the student users belonging to this cluster is the latest among 20 created clusters. In this table, we can observe the latent classes having a high belonging probability. By analyzing Tables 4 and 5, we can determine the reason for the latest finish date from the viewpoint of entry tendency.
Table 5 shows that the student users in this cluster that is being analyzed prefer the latent class 6 because it has the highest probability. The latent class 6, which has the highest probability, has a high occurrence probability of companies such as retail stores. In summary, the student users belonging to a cluster with the latest predicted finish date of jobhunting have a high preference for companies such as retail stores.
Next, Figure 3 shows the timeseries variation of belonging probability for each latent class.
The preference for the latent class 6 increases until March. The changes of belonging probabilities on the time axis are used to measure the changes of users’ preferences. It can be presumed that the student users focus on one type of industry from an early stage and do not change their entry tendencies during dogged jobhunting. This behavior is one of the reasons why they are not likely to finish their jobhunting early, thus prolonging the jobhunting activities. In addition, in this situation, it is difficult to solve the mismatch between a student user and companies in the same category.
We can control the mismatch between a student user and companies by allowing their entry tendency to change appropriately. The match between a student user and a company can be improved if student users evaluate their fit with the company and broaden their preference to include other companies.
We can apply the proposed model to analyze the relation between entry tendencies and the finish date by considering the time series, as mentioned above.
6. DISCUSSION
The numbers of latent classes K and clusters C are the parameters which should be determined. This is one of statistical model selection problems. Usually, model selection criteria, such as AIC, BIC, and MDL, can be applied to model selection problems with linear statistical models. However, formulas of these criteria were derived under the assumption that the maximum likelihood parameter estimator asymptotically follows a normal distribution (the central limit theorem). The likelihood function of a latent class model is not unimodal and the maximum likelihood estimator does not satisfy the central limit theorem. Therefore, the application of model selection criteria, such as AIC, BIC, and MDL, is not reasonable in a strict sense although these criteria can be just applied. In fact, even if a model selection criterion is applied, the practically appropriate number of latent classes is frequently not chosen because the number of parameters gets drastically large when the number of latent classes increases.
In addition, this paper does not concretely show an effective action plan for a group of student users whose jobhunting duration is likely to be prolonged. However, if the target student segment is identified and the entry tendencies can be analyzed, it is possible for the analysts to consider the actual action plan. It is important to make clear a target student group and it is possible by applying the proposed model.
In this paper, the numbers of latent classes were set experimentally appropriate values. It is so important to extract the characteristics of companies and student users through latent classes. It will, however, be a future work to discuss a reasonable method to determine the optimal numbers of latent classes and clusters.
7. CONCLUSION AND FUTURE WORK
In this study, we quantify the entry tendencies of a student user by considering the time shift and using an estimated aspect model to predict the finish date of jobhunting for the student user. Then, we propose a clustering method that considers the time series of the entry tendencies and design a model that represents the relative relation between the entry tendencies and the finish date. In the simulation experiment, we use actual data from an Internet portal site to demonstrate the effectiveness of the proposed method that considers the time series of the entry tendency of student users. Though our proposal is based on a hypothesis of a statistical relationship between the timeseries variation of users’ preference and the finish date of jobhunting, the effectiveness of the proposed model shows that the assumption is right.
In addition, we focused on the analysis of characteristics of student groups. We proposed the method to extract student groups who have difficulty on jobhunting activities, but we did not discuss what kind of support should be given to those students. In future work, we must consider the concrete support method for student users in further detail and evaluate the impact of the change in the number of clusters. In addition, we can study a recommendation system based on the proposed method as a suitable application.