1. INTRODUCTION
Assessing research performance is one of the important tasks of research organizations. Many institutions, including universities and public research institutes have developed ways to quantify the quality of research in order to evaluate individual research projects or researchers.
Evaluating individual research performance is a crucial component of research assessment, and outcomes of such evaluations can play a key role in establishing institutional research strategies, including funding plan, hiring, firing, and promotions. However, evaluating individual research performance is a complex task that could not be achieved with a single indicator. There are few consensus and standards accepted simultaneously in various fields to objectively measure the quality of research. In the past, quantity oriented measures have been used as evaluation indicators. For example, the number of papers published over a period of time was used as an index of evaluation in general. Recently, evaluation methods considering both quantity and quality have been proposed (Sahel, 2011). These methods combine number of publications, number of citations, influence indicators and other indicators together to present quantitative indicator. The number of citations of individual papers, however, is affected by how long the papers have been published. Thus Impact Factor has been widely used as a journal evaluation metric for its efficiency. Impact Factor is defined as the average number of citations of each journal in recent two years to the articles published in that journal. Evaluation of researcher’s research ability is assessed by the influence of the Impact Factor score.
This Impact Factor data is provided by the database of Journal Citation Reports from Thomson Reuters. How competitive the report in particular field is shown in this report using Impact Factor score.
However, Impact Factor cannot be a perfect indicator to evaluate journals. Several drawbacks have been raised about Impact Factor (Seglen, 1997; Vanclay, 2012). Firstly, Impact Factor could be manipulated considering review papers because review papers are more quoted than any other articles (Simon, 2008). The biggest problem, on the other hand, is that it is hard to compare journals belonging to different subject categories by Impact Factor because there is average difference in score of Impact Factor in different subject categories. It is the result of deviations between different research areas as a consequence of various natures of their academic environments. For instance, social science researchers mostly prefer to publish books rather than journals but computer scientists prefer to present their results in conference proceedings. (Chen and Konstan, 2010). In addition, Impact Factor can be affected by the imbalance of the number of researchers and the characteristics of the journal publication system, such as the review process of each research field. Therefore, it cannot be a problem to compare researchers or research performance in the same field using Impact Factor, but it can cause problems when comparing different fields. In particular, Impact Factor of articles in the field of “Medical Science,” and “Biology” are higher than any other field because the quotation is common. Thus, Impact Factor of low level journals in the field of “Medical Science,” and “Biology” are sometimes higher than high level of those in the field of “Mathematics”.
This drawback of Impact Factor is not a problem when evaluating researchers in the same field. However, when evaluating employees in universities or research institutes, it is common to evaluate researchers in other fields by applying the same criteria. In particular, Impact Factor based assessments can be discriminated against by differences between disciplines. Despite this problem, we mostly evaluate researchers by using Impact Factor in Korea. As mentioned above, Impact Factor is not a perfect evaluation because some researchers can get disadvantages in peculiar fields of study.
In previous studies, adjusted
Impact Factors for normalizing interfield Impact Factor have been proposed (Pyo et al., 2016). There are many ways to normalize Impact Factor, but if journals belong to multiple categories, their relative rankings can be changed by the adjusted Impact Factor.
Our study focuses on introducing a new robust journal evaluation metrics that can normalize the differences in impact factor among various categories while maintaining the relative rankings of individual journals. In order to achieve this purpose, we use an optimization algorithm. As a result of applying the proposed method to the recent journal Impact Factor, it is confirmed that the error decreases.
The rest of this paper is organized as follows. In Section 2, we introduce wellknown journal evaluation metric. In Section 3, we will introduce the methodology of our research. In Sections 4 and 5, experimental result and conclusion are provided.
2. REVIEW OF WELLKNOWN METRIC
To alleviate the problem that Impact Factors are different between subject categories, in previous studies, revised Impact Factor that divides Impact Factor with each representative was used (Pyo et al., 2016). Each representative values is obtained for each subject category. Each journal indexed in JCR is assigned to at least one subject category indicating a general area of science (SCI) or the social sciences (SSCI). There are about 230 subject categories in SCI and SSCI. Pyo et al. (2016) used Impact Factor weighted (number of articles) mean(AIF), a higher 20% ranked average Impact Factor, a higher 30% ranked average Impact Factor, a higher 50% ranked average Impact Factor, a higher 75% ranked average Impact Factor and total average Impact Factor as representative for each revised Impact Factor. An AIF, one of the representatives, indicates the average number of citations of articles in the subject category. Dividing Impact Factor as a representative value has the effect of normalizing the IF of each category.
These revised Impact Factors, however, have a problem when certain journal is related to more than two subject categories. In this case, we could not decide what we use as study representative. To deal with this problem, in the previous research they proposed two methods that using Average Impact Factor and Maximum Adjusted Impact Factor.
The suggested adjusted IF (AIF) can reflect the information of each category and is defined by an average of impact factors divided by the AIFs of the included subject categories. Let j^{k} ∈J be the kth journal where k∈{1, … in the alphabetical order and IF^{K} be the impact factor corresponding to journal j^{k} . Let c_{a} ∈C be the ath subject category where $\text{a}\in \{1,\text{}2,\text{\hspace{0.17em}}\dots ,\hspace{0.17em}\text{m}\}$ in the alphabetical order and AIF_{ca} is an aggregate impact factor corresponding subject categories. ${\text{j}}_{{\text{c}}_{\text{a}},{\text{c}}_{\text{b}}}^{\text{k}}$ represents the k^{th} journal included in subject categories c_{a} and c_{b} . c_{k} can be defined as a set of subject categories including the k^{th} journal. AIF of a journal, ${\text{j}}_{{\text{c}}_{\text{a}},{\text{c}}_{\text{b}},{\text{c}}_{\text{c}}}^{\text{k}}$ is the average value of the impact factor divided by each aggregate impact factor for a subject category included. k card(C_{k}) is the cardinality of k C_{k}.
Pyo et al. (2016) also suggested a way to use the mean value of the quantiles of the impact factors of the journals listed in a subject category as a representative of journal’s impact factor. The first new evaluation metric, namely QAVGIF, is defined as follows
Specifically, let Quan_{q} (c_{a}) be a top q % quantile in the order of impact factors of the journals in category c_{a} and AVG[Quan_{q} (c_{a})] be the average impact factor for the journals included in top q % quantile in category c_{a}.
The second evaluation metric, namely QMAXIF, uses a maximum value instead of using the average value.
Although the above methods have the effect of normalizing Impact Factors of journals in more than one category, there is a problem that the existing ranking may be reversed. In previous research, they addressed this problem as Error Rate. Each method has six percent to ten percent Error rate.
To solve this problem we use F min search to find minimal Error Rate by using linearly combined 11 revised Impact Factor.
3. PROPOSED METHOD
In this research, we focused on finding a new measure that can replace original impact factor while maintaining ranking of individual journals as much as possible. So we tried to search the best linear combination of our features, which we made by manipulating original impact factor. So the new factor is defined as follows.
where A is a weight set, a_{i} indicates weights of each factors, and x_{i} indicates impact factors. In our situation, we have 11 revised Impact Factor so m = 11.
To estimate the weights of new Impact Factor, an objective function is required. Thus we defined a new variable, r(A), which we have to minimize.
Initially: r(A) = 0,
Our goal is to find appropriate weight set A which minimizes r(A). In this case, the objective function we should minimize is the number of times the ranking of any pair of two journals belonging to the same field has changed, so it is difficult to use analytical or numerical methods related to gradients. Thus we used F min search method with different initial values to search for the best coefficients of our new factor.
3.1 F Min Search
F min search is a simplex search method (It is also called the NelderMead method) which means it is a direct search method, and it does not use numerical/ analytical gradients to compute optimized solution. So this algorithm is applied to nonlinear optimization problems, that derivatives may not be known. It is a heuristic search method that can converge to nonstationary points (McKinnon, 1999). In our case, the objective function is not differentiable, so it is appropriate to use F min search algorithm to solve this optimization problem.
To understand the algorithm, the concept of a simplex is fundamental. If there is an ndimensional vector x, the simplex will be a special polytope made by n+1 vertices in n dimensions. For example, the simplex will be a triangle on a plane, and will be a tetrahedron in threedimensional space. General idea of F min search is simple. In each search step, we pick new points around or inside the simplex. Then we evaluate the function values at those points and compare them to the function values at vertices. And if there’s an improvement, we replace one of the vertices with one of our newly picked points. So new simplex is now generated. This iteration is repeated until the diameter of the simplex becomes smaller than specific tolerance we chose. In the next section, details of the algorithm will be described.
3.2 Algorithm of F Min Search
The following is typical F min search algorithm. We are trying to minimize the error rate r(A) where $\text{A}\in \text{}{\mathbb{R}}^{\text{n}}$ , n = 11

A_{i} : List of points in the current simplex,

i =1, ..., n +1

Order the points in the simplex by function value, from the lowest r(A_{1}) to the highest r(A_{n+1}) Get rid of A_{n+1} which has the worst function value (because we are finding minimum values) and add new point to the simplex. [Or we can replace n points except A_{1} , as we can see in Step 7.]

Generate the reflected point.
${A}_{r}=2m{A}_{n+1},$
where
$m=\text{}\Sigma n,i=\mathrm{1...}\hspace{0.17em}n,$
and then calculate r(A_{r}).

If r (A_{1})≤r(A_{r})< r (A_{n})accept A_{r} and iteration is terminated. Reflect

If r (A_{r}) < r(A_{n}), calculate A_{e} which is expansion point.
${A}_{e}=\text{m}+\text{2}(\text{m}{A}_{n+1}),$
and calculate r(A_{e})

If r(A_{r})≥r(A_{n}) A contraction is performed between m and the better of {A_{n}_{+1},A_{r}}.

If r(A_{r})<r (A_{n}_{+1}) (i.e., A_{r} is better than A_{n}_{+1}), calculate the below.
${A}_{c}=m+({A}_{r}m)/2$
and calculate r(A_{c}). If r(A_{c}) < r(A_{r}) accept A_{c} and iteration is terminated. Contract outside Otherwise, continue with Step 7.

If r (A_{r}) ≥ r(A_{n}_{+1}), calculate the below.
${A}_{cc}=m+\left({A}_{n+1}m\right)/2$
If r (A_{cc})<r(A_{n}_{+1}), accept A_{cc} and iteration is terminated. Contract inside Otherwise, continue with Step 7.


Calculate the n points.
${V}_{i}=A{}_{1}+({A}_{i}{A}_{1})/2$
and calculate r(V_{i})(i=2,...,n+1) The simplex of next iteration is consisted of A_{1},V_{2},V_{n}_{+1}Shrink
Figure 1 is an example of F min search procedure. Bold outline is the original simplex, and iteration continues until the simplex reaches the stopping criterion.
When applying the algorithm, the initial simplex is important. If the initial simplex is too small, there is a high risk of being stuck easily.
4. EXPERIMENTAL RESULTS
We have applied the proposed method to real data to evaluate the performance of the proposed method. For empirical analysis, we gathered 11,619 journal Impact Factor data of 2013, which are available from http://www.webofknowledge.com. Impact Factors are calculated yearly for academic journals that are listed in the Journal Citation Reports (JCR).
Table 1 shows number of subject categories and journals in 2013 JCR. JCR categorizes journals as SCI and SSCI according to the field. Some journals in the fields of psychiatry, psychology, and healthcare administration are included in both SCI and SSCI. There are also some journals belonging to more than two different categories depending on the subject of the journal.
Table 2 shows statistics wellknown journal evaluation metrics proposed in Pyo et al. (2016). These statistics are about the mean values of each subject category. Most revised Impact Factors show less volatility than the original IF’s. Small standard deviations mean that the variance between different subject categories has been reduced.
We can identify that optimal value depends on the initial value of the experiment. Since our optimization problem is not a convex optimization problem, it may have regarded local optimal value as global optimal value. We have experimented with various initial values. Table 3 shows the mean and standard deviation of the coefficients from the experimental results using randomly selected 11 different initial values.
According to the results of Table 3, only a few of the 11 coefficients are statistically significant, but most of them are not significant. So we analyzed the results that showed the best performance rather than analyzing the effects of the coefficients. The best result is shown in Table 4.
Our objective is to find the optimal adjusted Impact Factor that minimizes error rate while reducing the difference between categories. Thus we evaluate the proposed method in two respects; error rate and variance between subject categories.
We can see the error rate from various adjusted Impact Factors in Table 5. We found that our suggested adjusted Impact Factor has lower error rate compared to previous adjusted Impact Factors. The minimum error rate of the existing adjusted Impact Factors is 5.88% and the error rate of our suggested adjusted Impact Factor is 5.04%.
In Figure 2, we compared ordering error counts of 12 metrics. It is the number of combinations of journals whose relative rank reversed compared to original ranks. Our suggested Impact Factor results in 54,130 error counts and it reduces about 10,000 counts compared to the minimum error count of the existing adjusted Impact Factor. Considering the number of papers published in these journals, these results can be regarded as a remarkable reduction. For reference, the difference in the number of combinations of papers is about 1,600,000.
Secondly, what we need to look at is how much new adjusted Impact Factors reduce crosssector volatility. Table 6 shows statistics of proposed adjusted Impact Factor. As shown in Table 6, the standard deviation value of the proposed Impact Factor is the smallest when compared with the existing values. The coefficient of variation (CV), defined as the ratio of the standard deviation to the mean, shows that the proposed method has the smallest variation (10.5134%) in comparison with other methods.
In Figure 3, we can compare average Impact Factor with average adjusted Impact Factor by categories’ distribution. From Figure 3 we can identify that our suggested adjusted Impact factor has reduced Impact factor variance between different categories.
Figure 4 compared average Impact factor and average adjusted Impact Factor in categories related ‘Biology’ and ‘Engineering’. We found that our adjusted Impact Factor is more resonable journal evaluation indicator in aspects of variance of categories.
5. CONCLUSIONS
It is important to evaluate research performance of researchers. Impact Factor has been widely used as an evaluation index. In this study, we propose a method to minimize the error that Impact Factor ranking is changed by finding the optimal combination of the modified Impact Factors to solve the imbalance problem between the research fields of Impact Factor. As a result, the deviation by subject category has been reduced and the error rate has been reduced compared with the existing method. Therefore, the number of damaged journals and papers affected by the change in rankings has been significantly reduced By using F min search method to find optimal value, our model showed error rate of 5.04%, while previous model showed 6.79% on average. Since F min search algorithm is sensitive to the initial value, there is room to improve the results by studying how to set good initial values in further research. We need to apply various algorithms and object functions to find the optimal journal evaluation measure in future study.
Although there are limitations left in evaluating research performance using Impact Factor, the proposed adjusted Impact Factor is expected to be helpful in evaluating researchers from different disciplines.