Lev Landau, a giant of Russian physics, kept a handwritten list in his notebook, ranking physicists on a logarithmic scale of achievement and grading them into “leagues” [Reference Hey and Walters24]. According to Landau, Isaac Newton and Albert Einstein belonged to the highest rank, above anyone else: he gave Newton the rank 0 and Einstein a 0.5. The first league, a rank of 1, contains the founding fathers of quantum mechanics, scientists like Niels Bohr, Werner Heisenberg, Paul Dirac, and Erwin Schrödinger. Landau originally gave himself a modest 2.5, which he eventually elevated to 2 after discovering superfluidity, an achievement for which he was awarded the Nobel Prize. Landau’s classification system wasn’t limited to famous scientists, but included everyday physicists, who are given a rank of 5. In his 1988 talk “My Life with Landau: Homage of a 4 1/2 to a 2,” David Mermin, who coauthored the legendary textbook Solid State Physics, rated himself a “struggling 4.5” [Reference Mermin, Gotsman, Ne’eman and Voronel25].
When scientists leave league 5 behind and start approaching the likes of Landau and other founders of a discipline, it’s obvious that their research has impact and relevance. Yet for the rest of us, things are somewhat blurry. How do we quantify the cumulative impact of an individual’s research output? The challenge we face in answering this question is rooted in the fact that an individual’s scientific performance is not just about how many papers one publishes, but a convolution of productivity and impact, requiring us to balance the two aspects in a judicious manner.
Of the many metrics developed to evaluate and compare scientists, one stands out in its frequency of use: the h-index, proposed by Jorge E. Hirsch in 2005 [Reference Hirsch26]. What is the h-index, and how to calculate it? Why is it so effective in gauging scientific careers? Does it predict the future productivity and impact of a scientist? What are its limitations? And how do we overcome these limitations? Answering these questions is the aim of this chapter.
The index of a scientist is h if h of her papers have at least h citations and each of the remaining papers have less than h citations [Reference Hirsch26]. For example, if a scientist has an h-index of 20 (h = 20), it means that she has 20 papers with more than 20 citations, and the rest of her papers all have less than 20 citations. To measure h, we sort an individual’s publications based on her citations, going from the most cited paper to the least cited ones. We can plot them on a figure, that shows the number of citations of each paper, resulting in a monotonically decreasing curve. Fig. 2.1 uses the careers of Albert Einstein and Peter Higgs as case studies showing how to calculate their h-index.
Therefore, if we define
we can rewrite (2.1) as
indicating that a scientist’s h-index increases approximately linearly with time. Obviously, researchers don’t publish exactly the same number of papers every year (see Chapter 1), and citations to a paper follow varied temporal trajectories (as we will cover in Chapter 19). Yet, despite the model’s simplicity, the linear relationship predicted by (2.3) holds up generally well for scientists with long scientific careers [Reference Hirsch26].This linear relationship (2.3) has two important implications:
(1) If a scientist’s h-index increases roughly linearly with time, then its speed of growth is an important indicator of her eminence. In other words, the differences between individuals can be characterized by the slope, m. As (2.2) shows, m is a function of both n and c. So, if a scientist has higher productivity (a larger n), or if her papers collect more citations (higher c), she has a higher m. And the higher the m, the more eminent is the scientist.
(2) Based on typical values of m, the linear relationship (2.3) also offers a guideline for how a typical career should evolve. For example, Hirsch suggested in 2005 that for a physicist at major research universities, h ≈ 12 might be a typical value for achieving tenure (i.e., the advancement to associate professor) and that h ≈ 18 might put a faculty member into consideration for a full professorship. Fellowship in the American Physical Society might typically occur around h ≈ 15–20, and membership in the US National Academy of Sciences may require h ≈ 45 or higher.
Since its introduction, the h-index has catalyzed a profusion of metrics and greatly popularized the idea of using objective indicators to quantify nebulous notions of scientific quality, impact or prestige [Reference Van Noorden27]. As a testament to its impact, Hirsh’s paper, published in 2005, had been cited more than 8,000 times as of the beginning of 2019, according to Google Scholar. It even prompted behavioral changes – some ethically questionable – with scientists adding self-citations for papers on the edge of their h-index, in hopes of boosting it [Reference Van Raan28–Reference Purvis30]. Given its prevalence, we must ask: can the h-index predict the future impact of a career?
The h-index for scientists is analogous to the Eddington number for cyclists, named after Sir Arthur Eddington (1882–1944), an English astronomer, physicist, and mathematician, famous for his work on the theory of relativity. As a cycling enthusiast, Eddington devised a measure of a cyclist’s long-distance riding achievements. The Eddington number, E, is the number of days in your life when you have cycled more than E miles. Hence an Eddington number of 70 would mean that the person in question has cycled at least 70 miles a day on 70 occasions. Achieving a high Eddington number is difficult, since jumping from, say, 70 to 75 may require more than 5 new long-distance rides. That’s because any rides shorter than 75 miles will no longer be included. Those hoping to increase their Eddington number are forced to plan ahead. It might be easy to achieve an E of 15 by doing 15 trips of 15 miles – but turning that E = 15 into an E = 16 could force a cyclist to start over, since an E number of 16 only counts trips of 16 miles or more. Arthur Eddington, who reached an E = 87 by the time he died in 1944, clearly understood that if he wanted to achieve a high E number, he had to start banking long rides early on.
Advantage: Measures the productivity of an individual.
Disadvantage: Ignores the impact of papers.
(2) Total number of citations (C).
Advantage: Measures a scientist’s total impact.
Disadvantage: It can be affected by a small number of big hits, which may not be representative of the individual’s overall career, especially when these big hits were coauthored with others. It also gives undue weight to highly cited reviews as opposed to original research contributions.
Advantage: Allows us to compare scientists of different ages.
Disadvantage: Outcomes can be skewed by highly cited papers.
Advantage: Eliminates the disadvantages of (1), (2), (3), and measure broad and sustained impact.
Disadvantage: The definition of “significant” introduces an arbitrary parameter, which favors some scientists or disfavors others.
(5) The number of citations acquired by each of the q most-cited papers (for example, q = 5).
Advantage: Overcomes many of the disadvantages discussed above.
Disadvantage: Does not provide a single number to characterize a given career, making it more difficult to compare scientists to each other. Further, the choice of q is arbitrary, favoring some scientists while handicapping others.
The key advantage of the h-index is that it sidesteps all of the disadvantages of the metrics listed above. But, is it more effective at gauging the impact of an individual’s work? When it comes to evaluating the predictive power of metrics, two questions are often the most relevant.
Q1: Given the value of a metric at a certain time t1, how well does it predict the value of itself or of another metric at a future time t2?
This question is especially interesting for hiring decisions. For example, if one consideration regarding a faculty hire is the likelihood of the candidate to become a member of the National Academy of Sciences 20 years down the line, then it would be useful to rank the candidates by their projected cumulative achievement after 20 years. Hirsch tested Q1 by selecting a sample of condensed matter physicists and looked at their publication records during the first 12 years of their career and in the subsequent 12 years [Reference Hirsch31]. More specifically, he calculated four different metrics for each individual based on their career records in the first 12 years, including the h-index (Fig. 2.2a), the total number of citations (Fig. 2.2b), the total number of publications (Fig. 2.2c), and the average number of citations per paper (Fig. 2.2d). He then asked if we want to select candidates that have the most total citations by year 24, which one of the four indicators gives us the best chance? By measuring the correlation coefficient between future cumulative citations at time t2 and four different metrics calculated at time t1, he found that the h-index and the number of citations at time t1 turn out to be the best predictors (Fig. 2.2).
To answer Q2, we need to use indicators obtained at t1 to predict scientific achievement occurring only in the subsequent period, thereby omitting all citations to work performed prior to t1. Hirsch repeated the similar prediction task for the four metrics, but this time used each of them to predict total citations accrued by papers published only in the next 12 years. Naturally, this is a more difficult task, but an important one for allocating research resources. Hirsch found that the h-index again emerges as the best predictor for achievement incurred purely in future time frame [Reference Hirsch31].
These findings indicate that two individuals with similar h are comparable in terms of their overall scientific achievement, even if their total number of papers or citations are quite different. Conversely, two individuals of the same scientific age can have a similar number of total papers or citation counts but very different h values. In this case, the researcher with the higher h is typically viewed by the community as the more accomplished. Together, these results highlight the key strength of the h-index: When evaluating scientists, it gives an easy but relatively accurate estimate of an individual’s overall scientific achievements. Yet at the same time, we must also ask: What are the limitations of the h-index?
I thought about it first in mid 2003, over the next weeks I computed the h-index of everybody I knew and found that it usually agreed with the impression I had of the scientist. Shared it with colleagues in my department, several found it interesting.
Mid June 2005 I wrote up a short draft paper, sent it to 4 colleagues here. One skimmed over it, liked it and made some suggestions, one liked some of it and was nonplussed by some of it, two didn’t respond. So I wasn’t sure what to do with it.
Mid July 2005 I got out of the blue an email from Manuel Cardona in Stuttgart saying he had heard about the index from Dick Zallen at Virginia Tech who had heard about it from one of my colleagues at UCSD (didn’t say who but I can guess). At that point I decided to clean up the draft and post it in arXiv, which I did August 3, 2005, still was not sure what to do with it. Quickly got a lot of positive (and some negative) feedback, sent it to PNAS August 15.
2.3 Limitations of the h-Index
The main street of College Hill in Easton, Pennsylvania – the home of the Lafayette College – is named after James McKeen Cattell. As an American psychologist, Cattell played an instrumental role in establishing psychology as a legitimate science, advocacy that prompted the New York Times to call him “the dean of American science” in his obituary.
While many have thought of developing new metrics to systemically evaluate their fellow researchers, Cattell was the first to popularize the idea of ranking scientists. He wrote in his 1910 book, American Men of Science: A Biographical Directory [Reference Cattell32]: “It is surely time for scientific men to apply scientific method to determine the circumstances that promote or hinder the advancement of science.” So, today’s obsession of measuring impact using increasingly sophisticated yardsticks is by no means a modern phenomenon. Scientists have been sizing up their colleagues since the beginning of the discipline itself. A century after Cattell’s book, the need and the rationale for a reliable toolset to evaluate scientists has not changed [Reference Lane33].
As the h-index has become a frequently used metric of scientific achievements, we must be mindful about its limitations. For example, although a high h is a somewhat reliable indicator of high accomplishment, the converse is not necessarily always true [Reference Hirsch31]: an author with a relatively low h can achieve an exceptional scientific impact with a few seminal papers, such as the case of Peter Higgs (Fig. 2.1b). Conversely, a scientist with a high h achieved mostly through papers with many coauthors would be treated overly kindly by his or her h. Furthermore, there is considerable variation in citation distributions even within a given subfield, and subfields where large collaborations are typical (e.g., high-energy experimental physics) will exhibit larger h values, suggesting that one should think about how to normalize h to more effectively compare and evaluate different scientists.Next we discuss a few frequently mentioned limitations of the h-index, along with variants that can – at least to a certain degree – remedy them.
Highly cited papers. The main advantage of the h-index is that its value is not boosted by a single runaway success. Yet this also means that it neglects the most impactful work of a researcher. Indeed, once a paper’s citations get above h, its relative importance becomes invisible to the h-index. And herein lies the problem – not only do outlier papers frequently define careers, they arguably are what define science itself. Many remedies have been proposed to correct for this [Reference Alonso, Cabrerizo and Herrera-Viedma34–Reference Kosmulski39], including the g-index (the highest number g of papers that together received g2 or more citations [Reference Egghe40, Reference Egghe41]) and the o-index (the geometric mean of the number of citations gleaned by a scientist’s highest cited papers c∗ and her h-index: o = [Reference Dorogovtsev and Mendes42]). Other measures proposed to correct this bias include a-index [Reference Burrell36, Reference Jin, Liang and Rousseau38]; h(2)-index [Reference Kosmulski39]; hg-index [Reference Alonso, Cabrerizo and Herrera-Viedma34]; q2-index [Reference Cabrerizo, Alonso and Herrera-Viedma37]; and more [Reference Alonso, Cabrerizo and Herrera-Viedma35].
Inter-field differences. Molecular biologists tend to get cited more often than physicists who, in turn, are cited more often than mathematicians. Hence biologists typically have higher h-index than physicists, and physicists tend to have an h-index that is higher than mathematicians. To compare scientists across different fields, we must account for the field-dependent nature of citations [Reference Radicchi, Fortunato and Castellano43]. This can be achieved by the hg-index, which rescales the rank of each paper n by the average number of papers written by author in the same year and discipline, n0 [Reference Radicchi, Fortunato and Castellano43] or the hs-index, which normalizes the h-index by the average h of the authors in the same discipline [Reference Kaur, Radicchi and Menczer44].
Time dependence. As we discussed in Chapter 2.2, the h-index is time dependent. When comparing scientists in different career stages, one can use the m quotient (2.2) [Reference Hirsch26], or contemporary h-index [Reference Sidiropoulos, Katsaros and Manolopoulos45].
Collaboration effects. Perhaps the greatest shortcoming of the h-index is its inability to discriminate between authors that have very different coauthorship patterns [Reference Hirsch46–Reference Schreiber48]. Consider two scientists with similar h indices. The first one is usually the intellectual leader of his/her papers, mostly coauthored with junior researchers, whereas the second one is mostly a junior author on papers coauthored with eminent scientists. Or consider the case where one author always publishes alone whereas the other one routinely publishes with a large number of coauthors. As far as the h-index is concerned, all these scientists are indistinguishable. Several attempts have been proposed to account for the collaboration effect, including fractionally allocating credit in multi-authored papers [Reference Schreiber48–Reference Galam50], and counting different roles played by each coauthor [Reference Tscharntke, Hochberg and Rand51–Reference Hu, Rousseau and Chen54] by for example differentiating the first and last authorships. Hirsch himself has also repeatedly acknowledged this issue [Reference Hirsch46, Reference Hirsch47], and proposed the hα-index to quantify an individual’s scientific leadership for their collaborative outcomes [Reference Hirsch47]. Among all the papers that contribute to the h-index of a scientist, only those where he or she was the most senior author (the highest h-index among all the coauthors) are counted toward the hα-index. This suggests that a high h-index in conjunction with a high hα/h ratio is a hallmark of scientific leadership [Reference Hirsch47].
In addition to these variations of the h-index, there are other metrics to quantify the overall achievement of individual scientists, including the i10-index, used exclusively by Google Scholar , which computes the number of articles with at least 10 citations each; or the SARA method [Reference Radicchi, Fortunato and Markines56], which uses a diffusion algorithm that mimics the spreading of scientific credits on the citation network to quantify an individual’s scientific eminence. Despite the multitude of metrics attempting to correct the shortcomings of the h-index, to date no other bibliometric index has emerged as preferable to the h-index, cementing the status of the h-index as a widely used indicator of scientific achievement.
As we dug deeper into h-index and the voluminous body of work motivated by it, it was easy to forget a perhaps more important point: No scientist’s career can be summarized by a single number. Any metric, no matter how good it is at achieving its stated goal, has limitations that must be recognized before it is used to draw conclusions about a person’s productivity, the quality of her research, or her scientific impact. More importantly, a scientific career is not just about discoveries and citations. Rather, scientists are involved in much broader sets of activities including teaching, mentoring, organizing scientific meetings, reviewing, and serving on editorial boards, to name a few. As we encounter more metrics for scientific eminence, it’s important to keep in mind that, while they may help us understand certain aspects of scientific output, none of them alone can capture the diverse contributions scientists make to our community and society [Reference Abbott, Cyranoski and Jones57, Reference Pavlou and Diamandis58]. Just as Einstein cautioned: “Many of the things you can count, don’t count. Many of the things you can’t count, do count.”
Therefore, we must keep in mind that the h-index is merely a proxy to quantify scientific eminence and achievement. But the problem is, in science, status truly matters, influencing the perception of quality and importance of one’s work. That’s what we will focus on in the next chapter, asking if and when status matters, and by how much.