To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure email@example.com
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Here we discuss how the use of artificial intelligence will change the way science is done. Deep learning algorithms can now surpass the performance of human experts, a fact that has major implications for the future of our discipline. Successful uses of AI technology all possess two ingredients for deep learning: copious training data and a clear way to classify it. When these two conditions are met, researchers working in tandem with AI technologies can organize information and solve scientific problems with impressive efficiency. The future of science will increasingly rely on human–machine partnerships, where people and computers work together, revolutionizing the scientific process. We provide an example of what this may look like. Hoping to remedy a present-day challenge in science known as the “reproducibility crisis,” researchers used deep learning to uncover patterns in papers that signal strong and weak scientific findings. By combining the insights of machines and humans, the new AI model acheives the highest predictive accuracy.
We begin by discussing the challenges of quantifying scientific impact. We introduce the h-index and explore its implications for scientists. We also detail the h-index’s strengths when compared with other metrics, and show that it bypasses all the disadvantages posed by alternative ranking systems. We then explore the h-index’s predictive power, finding that it provides an easy but relatively accurate estimate of a person’s acheivements. Despite its relative accuracy, we are aware of the h-index’s limitations, which we detail here with suggestions for possible remedies.
To describe coauthorship networks, we begin with the Erdös number, which links mathematicians to their famously prolific colleague through the papers they have collaborated on. Coauthorship networks help us capture collaborative patterns and identify important features that characterize them. We can also use them to predict how many collaborators a scientist will have in the future based on her coauthorship history. We find that collaboration networks are scale-free, following a power-law distribution. As a consequence of the Matthew effect, frequent collaborators are more likely to collaborate, becoming hubs in their networks. We then explore the small-world phenomenon evidenced in coauthorship networks, which is sometimes referred to as “six degrees of separation.” To understand how a network’s small-worldliness impacts creativity and success, we look to teams of artists collaborating on Broadway musicals, finding that teams perform best when the network they inhabit is neither too big or too small. We end by discussing how connected components within networks provide evidence for the “invisible college.”
We introduce the role that productivity plays in scientific success by describing Paul Erdös’ exceptional productivity. How does Erdös’ productivity measure up to other scientists? Is the exponential increase in the number of papers published due to rising productivity rates or to the growing number of scientists working in the discipline? We find that there is an increase in the productivity of individual scientists but that that increase is due to the growth of collaborative work in science. We also quantify the significant productivity differences between disciplines and individual scientists. Why do these differences exist? To answer this question, we explore Shockley’s work on the subject, beginning with his discovery that productivity follows a lognormal distribution. We outline his hurdle model of productivity, which not only explains why the productivity distribution is fat-tailed, but also provides a helpful framework for improving individual scientific output. Finally, we outline how productivity is multiplicative, but salaries are additive, a contradiction that has implications for science policy.
Here we address bias and causality, beginning with the bias against failure in the existing science of science research. Because the data available to us is mostly on published papers, we necessarily disregard the role that failure plays in a scientific career. This could be framed as a surviorship bias, where the “surviving” papers are those that make it to publication. This same issue can be seen as a flaw in our current definition of impact, since our use of citation counts keeps a focus on success in the discipline. We explore the drawbacks and upsides of variants on citation counts, including altmetrics like page views. We also look at how possible ways to expand the science of science to include unobservable factors, as we saw in the case of the credibility revolution in economics. Using randomized controlled trials and natural experiments, the science of science could explore causality more deeply. Given the tension between certainty and generalizability, both experimental and observational insights are important to our understanding of how science works.
While there is plenty of information available about the luminaries of science, here we discuss the relative lack of information about ordinary researchers. Luckily, because of recent advances in name disambiguation, the career histories of everyday scientists can now be analyzed, changing the way we think about scientific creativity entirely. We describe how the process of shuffling a career – moving the works a scientist publishes around randomly in time – helped us discover what we call the “random impact rule,” which dictates that, when we adjust for productivity, the highest impact work in a career can occur at any time. We also see that the probability of landmark work follows a cumulative distribution, meaning that the random impact rule holds true not just for the highest impact work in any career but also for other important works, too. While there is precedent for this rule in the literature – Simonton proposed the “constant probability of success” model in the 1970s – until recently we didn’t have the data on hand to test it. The random impact rule allows us to decouple age and creativity, instead linking periods of high productivity to creative breakthroughs.
We begin by asking how far back in the literature we should go when choosing discoveries to build on. In other words, how myopic is science in the age of Google Scholar? By looking at the age distribution of citations and identifying knowledge “hot spots,” we pinpoint the unique combinations of old and relatively new knowledge that are most likely to produce new breakthroughs. In doing so, we see that the way we build on past knowledge follows clear patterns, and we explore how these patterns shape future scientific discourse. We also look at the the impact that a citation’s jump–decay pattern has on the relevance of research over time, finding that all papers have an expiration date and that we can predict that date based on the jump–decay pattern.
We begin by acknowledging the sheer size of the citation index to date, and then discuss the disparity in citations that these papers receive. These differences in impact among papers can be captured by a citation distribution, which can be approximated by a power-law function. We compare power-law distributions to Gaussian distributions, illustrating the distinctions between the two and what they tell us about citation patterns. We then explore the differences in average number of citations between fields, which can make cross-disciplinary comparisions complicated. Luckily, we find that citation patterns are surprisingly universal relative to the field a paper is published in, which allows us identify common trends in citation and impact regardless of discipline. We end with a discussion of what citations don’t capture, given that they are frequently used as a proxy for impact. We pinpoint some potential flaws in this metric, but see citation patterns as a valuable way to gauge the collective wisdom of the scientific community.
Given the jump–decay citation patterns discussed in the previous chapter, are we forced to conclude that the papers we publish will be relevant for only a few years? We find that while aggregate citations follow a clear pattern, the trajectories of individual citations are remarkably variable. Yet, by analyzing individual citation histories, we are able to isolate three parameters – immediacy, longevity, and fitness – that dictate a paper’s future impact. In fact, all citation histories are governed by a single formula, a fact which speaks the universality of the dynamics that at first seemed quite variant. We end by discussing how a paper’s ultimate impact can be predicted using one factor alone: its relative fitness. We show how papers with the same fitness will acquire the same number of citations in the long run, regardless of which journals they are published in.
We end the book by inviting our readership to broaden the science of science for the benefit of all. By thinking beyond disciplinary barriers and considering the benefits of the science of science in its entirety, we hope that future research will increase the depth and advancement of science.
Here we provide an overview of Part I, introducing the main themes we will address as they relate to the science of careers. We ask what mechanisms drive productivity and impact, how creativity is distributed over the course of a career, and whether a scientist’s highest impact work can tell us anything about the other work they produce.
Here we explore the mechanisms and drivers behind the impact disparity discussed in the previous chapter, focusing on what factors create high-impact papers and what conditions contribute to the lognormal distribution citations follow. We show how a rich-get-richer phenomenon similar to preferential attachement, growth, and fitness all contribute to the impact of a paper. We describe a fitness model that can effectively represent these dynamics, providing insight into how impact is created in science.
We ask if it’s possible to accelerate the advancement of science by applying the science of science to the frontiers of knowledge. Using a robot scientist as an example, we show how it is now possible to close the loop by building machines that can create scientific knowledge. We discuss the implications of this on the future of the discipline. Another way to more efficiently advance science is to generate more fruitful hypotheses. We discuss the Swanson hypothesis, which provides a window into how to hone in on valuable discoveries, allowing for the forecasting of frutiful areas of research. We then explore how the frontiers of science can be traced, allowing scientists to more thoughtfully choose topics that will accelerate collective discovery. Finally, we address some challenges posed by this issue, including the “file drawer problem,” which could be mitigated by a more systemic approach to sharing negative results with colleagues in the discipline. We suggest several ways to incentivize and reward impactful science so that we can efficiently reap its benefits.
We begin by showing that age-specific patterns affect the allocation of funding in science. We then ask if there are age specific patterns that dictate when a scientist does her best work, and show that there are universal trends in the age distribution of great innovation. We offer possible explanations as to why these patterns occur. One explanation, which helps explain why scientists typically reach peak performance in middle age, is the “burden of knowledge” theory. Yet this explanation doesn’t account for the discipline-specific trends in age at peak performance that complicate the picture, which may be accounted for by the type of work produced. Research shows that there are two kinds of innovators–conceptual and experimental–and that each has a different peak. Experimental innovators, who accumulate knowledge through experience, tend to peak later. Conceptual innovators, who apply abstract principles, tend to peak earlier. We end by discussing Planck’s principle, which posits that young and old scientists have differing affinities for accepting new ideas.
Here, we focus on two factors that contribute to a paper’s fitness: novelty and publicity. By measuring the novelty of the ideas shared in a paper, we can explore the link between the originality of the research and its impact. Since new ideas are typically snythesized from existing knowledge, we can assess the novelty of an idea by looking at the number domains from which researchers sourced their ideas and how expected or unexpected the combination of domains are. Evidence shows that rare combinations in scientific publications or inventions are associated with high impact. Yet novel ideas are riskier than conventional ones, frequently resulting in failure. Research indicates that scientists tend to be biased against novelty, making unconventional work more difficult to get off the ground. In order to mitigate risk while maximizing novelty, scientists must balance novelty with conventionality. We then look at the role that publicity plays in amplifying a paper’s impact. We find that publicity, whether good or bad, always boosts a paper’s citation counts, indicating that, even in science, it’s better to receive negative attention than no attention at all.
We begin with an anecdote about the largest team in scientific history, and then discuss the shift toward larger teams more generally. We show that the team size distribution has changed its fundamental shape since the 1950s, shifting from a Poussion distribution to a power law distribution as teams have grown larger. These two mathematical shapes represent different modes in which teams form. An exponential distribution leads to the creation of small “core” teams. A power-law distribution results in “extended” teams, accumulating new members in proportion to the productivity of their existing members. These two modes allow us to create an accurate model of team formation, providing us with insight about how team size affects its survival, longevity, and creation of knowledge. We can then assess some of the benefits and drawbacks of large teams, and explore the different kinds of science large and small teams produce. We show how to quantify the disruption of an idea by creating a disruption index, and explain how levels of disruption reflect team size. We end by discussing the implications of the shift to larger teams in science, making a case for preserving smaller teams.
Using a story of credit allocation gone wrong, we introduce some of the challenges that come with assigning credit to collaborative work in science, espeically given the historicial emphaisis on individual acheivement in our field. We explore traditional methods for indicating ownership of scientific work, particularly the ordering of authors on a paper. While this method for understanding who should get the lion’s share of the credit for a discovery is usually effective, it is complicated by discipline-specific variations in the order of authorship. We also look at how alphabetical ordering of authorship in some fields further complicates the picture, how “guest authors” and “ghost authors” reflect flaws in the credit allocation system, and how bias affects the process adversely. We end with a discussion of the alarming colloboration penalty women economists experience, which illustrates the mishaps that can and do occur as a result of the existing system.