Skip to main content Accessibility help


  • Access
  • Cited by 1



      • Send article to Kindle

        To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

        Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

        Find out more about the Kindle Personal Document Service.

        Materials science in the artificial intelligence age: high-throughput library generation, machine learning, and a pathway from correlations to the underpinning physics
        Available formats

        Send article to Dropbox

        To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

        Materials science in the artificial intelligence age: high-throughput library generation, machine learning, and a pathway from correlations to the underpinning physics
        Available formats

        Send article to Google Drive

        To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

        Materials science in the artificial intelligence age: high-throughput library generation, machine learning, and a pathway from correlations to the underpinning physics
        Available formats
Export citation


The use of statistical/machine learning (ML) approaches to materials science is experiencing explosive growth. Here, we review recent work focusing on the generation and application of libraries from both experiment and theoretical tools. The library data enables classical correlative ML and also opens the pathway for exploration of underlying causative physical behaviors. We highlight key advances facilitated by this approach and illustrate how modeling, macroscopic experiments, and imaging can be combined to accelerate the understanding and development of new materials systems. These developments point toward a data-driven future wherein knowledge can be aggregated and synthesized, accelerating the advancement of materials science.


The use of statistical and machine learning (ML) algorithms (broadly characterized as “Artificial Intelligence (AI)” herein) within the materials science community has experienced a resurgence in recent years.[1] However, AI applications to material science have ebbed and flowed through the past few decades.[27] For instance, Volume 700 of the Materials Research Society's Symposium Proceedings was entitled “Combinatorial and Artificial Intelligence Methods in Materials Science,” more than 15 years ago,[8] and expounds on much of the same topics as those at present, with examples including high-throughput (HT) screening, application of neural networks to accelerate particle simulations, and use of genetic algorithms to find ground states. One may ask the question as to what makes this resurgence different, and whether the current trends can be sustainable. In some ways, this mirrors the rises and falls of the field of AI, which has had several bursts of intense progress followed by “AI winters.”[9,10] The initial interest was sparked in 1956,[11] where the term was first coined, and although interest and funding were available, computational power was simply too limited. A rekindling began in the late 1980s, as more algorithms (such as backpropagation for neural networks[12] or the kernel method for classification[13]) were utilized. The recent spike has been driven in large part by the success of deep learning (DL),[14] with the parallel rise in graphics processing units and general computational power.[15,16] The question becomes whether the current, dramatic progress in AI can translate to the materials science community. In fact, the key enabling component of any AI application is the availability of large volumes of structured labeled data—which we term in this prospective “libraries.” The available library data both enables classical correlative ML and also opens a pathway for exploration of underlying causative physical behaviors. We argue in this prospective that, when done in the appropriate manner, AI can be transformative not only in that it can allow for acceleration of scientific discoveries but also that it can change the way materials science is conducted.

The recent acceleration of adoption of AI/ML-based approaches in materials science can be traced back to a few key factors. Perhaps, most pertinent is the Materials Genome Initiative, which was launched in 2011 with an objective to transform manufacturing via accelerating materials discovery and deployment.[17] This required the advancement of HT approaches to both experiments and calculations, and the formation of online, accessible repositories to facilitate learning. Such databases have by now have become largely mainstream with successful examples of databases including Automatic Flow for Materials Discovery (AFLOWLIB),[18] Joint Automated Repository for Various Integrated Simulations (JARVIS-density functional theory (DFT)),[19] Polymer Genome,[20] Citrination,[21] and Materials Innovation Network[22] that host hundreds of thousands of datapoints from both calculations as well as experiments. The timing of the initiative coincided with a rapid increase in ML across commercial spaces, largely driven by the sudden and dramatic improvement in computer vision, courtesy of deep neural networks, and the availability of free packages in R or python (e.g., scikit-learn[23]) to apply common ML methods on acquired datasets. This availability of tools, combined with access to computational resources (e.g., through cloud-based services or internally at large institutions), was also involved. It can be argued that one of the main driving forces within the materials science community was an acknowledgement that many grand challenges, such as the materials design inverse problem, were not going to be solved with conventional approaches. Moreover, the quantities of data that were being acquired, particularly at user facilities such as synchrotrons or microscopy centers, were accelerating exponentially, rendering traditional analysis methods that relied heavily on human input unworkable. In the face of the data avalanche, it was perhaps inevitable that scientists would turn to the methods provided via data science and ML.[2426] Note that in this prospective,commercial software is identified to specify procedures. Such identification does not imply a recommendation by the National Institute of Standards and Technology.

Thus, the question becomes, how can these newly found computational capabilities and “big” data be leveraged to gain new insights and predictions for materials? There are already some answers. For example, the torrent of data from first principles simulations has been used for HT screening of candidate materials, with notable successes.[2729] Naturally, one asks the question as to what insights can be gained from similar databases based not on theory, but on experimental data, e.g., of atomically resolved structures, along with their functional properties. Of course, microstructures have long been optimized in alloy design.[21,30] Having libraries (equivalently, databases) of these structures, with explicit mentioning of their processing history, can be extremely beneficial not just for alloys but for many other material systems, including soft matter.[31] These databases can be used for, e.g., utilizing known knowledge of similar systems to accelerate the synthesis optimization process, to train models to automatically classifying structures and defects, and to identify materials with similar behaviors that are exhibited, potentially allowing underlying causal relationships to be established.

In this prospective, we focus on the key areas of library generation of material structures and properties, through both simulations/theory and imaging. HT approaches enable both simulation and experimental databases to be compiled, with the data used to build models that enable property prediction, determine feature importance, and guide experimental design. In contrast, imaging provides the necessary view of microstates enabling the development of statistical mechanical models that incorporate both simulations and macroscopic characterization to improve predictions and determine underlying driving forces. Combining the available experimental and theoretical libraries in a physics-based framework can accelerate materials discoveries and lead to lasting transformations of the way materials science research is approached worldwide.

Databases, libraries, and integration

This prospective will focus on theory-led initiatives for database generation (and subsequent ML to predict properties and accelerate material discovery) and contrast them with the equally pressing need for their experimental counterparts. While the theory libraries are well ahead, substantial progress in materials science will rely on experimental validation of theoretical predictions and tight feedback between data-driven models, first principles and thermodynamic modeling, and experimental outcomes. It is also important to note that theoretical databases and libraries operate with an idealized representation, where all inputs and outputs are known and hence of interest are processes such as data compression, determination of reduced descriptors, and integration into analysis workflows. However, the validity and precision of theoretical models are always evolving. In comparison, experimental data will be characterized by the large number of latent or unknown degrees of freedom that may or may not be relevant to specific phenomena.

Experimental libraries can be created from combinatorial experiments to rapidly map the composition space and complemented with atomic and functional imaging to generate libraries that can map local structure to functionality. The broad vision is summarized in Fig. 1. The success of any of these individual areas on their own will be limited; experimentally, the search space is much too large to iterate; computationally, the prediction of certain properties or the role of defects in, e.g., correlated systems remain extremely challenging and models still need experimental validation. From the imaging standpoint, much work remains to be done in automating the generation of atomic-scale defect libraries, although computer vision and DL-based approaches are showing tremendous promise.[32,33] These data, from theory and experiment, across length scales, can then be combined either directly in data-driven models (ML) or through more formal methods that consider uncertainty, such as Bayesian methods. This can also be achieved using statistical mechanical models that are refined and fit based on theoretical and experimental data at multiple length scales, allowing the understanding of the driving forces for materials behavior and enabling feedback to experiment and first principles theory.

Figure 1. Progress in materials science requires understanding driving forces governing phenomena, so that materials can be both discovered and optimized for applications. Fundamentally, accessing the knowledge space to accelerate this cycle requires the availability of data from simulations and the experiment for materials synthesized under different conditions. Imaging provides a window into local configurations and provides a critical link for understanding the driving forces of observed behavior. ML tools enable the generation of these databases and facilitate a rapid prediction of properties from data-driven models. Similarly, the data can be synthesized together in a Bayesian formulation, or using statistical mechanical models, to agglomerate all available sources of information to produce more accurate predictions. Ideally, the knowledge gained will be transferable, enabling more efficient design cycles for similar material systems. These tools all require community efforts for availability of code, data, and workflows, that is critical to realizing this new future.

Our roadmap for this prospective is as follows. We begin with an overview of databases of theoretical calculations, which in many ways catalyzed this field, and which are the most well-established in this area. We then branch from HT computations to HT experiments that can be used to generate experimental realizations in rapid time. These are beneficial for exploring macroscopic structure–property relationships. Complementing the macroscopic studies is the need for local imaging libraries, which compare the local atomic or mesoscopic structure with the local functional property. We discuss recent works to address this issue, which has been less well explored, but which are critical for the understanding of disordered systems with strong localization. Finally, we explain how these libraries can be utilized in concert and incorporated into a statistical mechanical framework for predictive modeling with quantified uncertainty. We end with a discussion on the challenges at the individual, group, and department level, and describe our outlook for material science under this new paradigm.

Theory-based library generation

Whereas for most of humanity materials discovery was largely Edisonian in approach, in the modern era, materials design can be facilitated via first principles (and other) simulations that can rapidly explore different candidates in silico. Computational methods are usually classified in terms of length scale, going from quantum atomistic to continuum; however, irrespective of their scale, they all are constrained by the scale of a simulation (length and time), accuracy, and transferability. For instance, quantum-based methods, such as DFT, have been phenomenally successful in discovering new materials with important technological applications, such as those used in solid-state batteries,[34,35] dopants for effective strengthening of alloys,[36] or 2D materials.[37,38] These methods also aided in explaining physical phenomena such as diffusion mechanisms,[39] experimental spectra,[40] etc. More recently, DFT[41]-based HT approaches have led to the creation of open-source large material property databases such as MaterialsProject,[42] AFLOWLIB,[18] Open Quantum Materials Database (OQMD),[43] Automated Interactive Infrastructure and Database for Computational Science (AiiDA),[44] JARVIS-DFT,[45] Organic Materials Database (OMDB)[46] QM9,[47] etc. However, DFT is heavily limited by the simulation size to something on the order of a few hundred atoms. Empirical potentials[48] help overcome the size issue, as they can simulate millions of atoms. However, they require rigorous potential fitting to simulate reasonable behavior.[49,50] Larger scale methods, such as finite element method and phase field, are limited by depending on critical inputs from experimental data and atomistic simulations.[51] Fortunately, ML for materials has evolved to become a promising alternative in solving some of the computational materials science problems mentioned above.[52]

There are four main components in successfully applying ML to materials: (i) acquiring large enough datasets, (ii) designing feature vectors that can appropriately describe the material, (iii) implementing a validation strategy for the models, and (iv) interpreting the ML model where applicable. The first step (i) is facilitated by the generation of the large datasets mentioned above. Step (ii) is more complicated: while the databases provide a consistent set of target data, conversion of core material science knowledge to computers requires feature vector generation of all those materials in the databases. Chemical descriptors based on elemental properties (for instance, the average of electronegativity and ionization potentials in a compound) have been successfully applied in fields such as alloy formation[53] and have led to for various computational discoveries.[53] Nevertheless, this approach is not appropriate when modeling different structure prototypes with the same composition because ignoring structural information does not allow to differentiate between them. Structural features as descriptors have been recently proposed based on the Coulomb matrix,[54,55] partial radial distribution function,[56] Voronoi tessellation,[57] Fourier series,[58] and several others in recent works.[59] Features such as classical force-field inspired descriptors (CFID)[60] and fragment descriptors[61] allow combing structural and chemical descriptors in one unified framework. These are generally a fixed size descriptor of all the samples in the dataset. For example, MagPie[53] gives 145 features, while CFID[60] gives 1557 descriptors.

A conceptually different way to obtain feature vectors is to generate them automatically using approaches such as convolution neural networks,[62] SchNet,[63] and Crystal Graph Convolutional Neural Networks (CGCNN)[64] for instance, which extracts the important features by themselves taking advantage of a deep neural network architecture. Most of these methods are applied to specific classes of materials because of the presence or absence of periodicity in one or more crystallographic directions, such as crystalline inorganic solids, molecules or proteins, but features such as the Coulomb matrix,[54] CFID,[60] SchNet,[65] MegNet,[66] and GCNN[62,67] hold a generalized appeal for all classes of materials. Luckily, some of these feature generators are available in the general ML-framework code such as Matminer.[68] A comprehensive set of feature vector types, their applications, and corresponding resource links are provided in Table I. The validation strategy consists of reporting accuracy metrics such as the mean absolute error, root mean square error, and R2. Importantly, plots such as learning curves and cross-validation plots are standard ways of testing ML models from the data science perspective. Although these are some of the common data-science metrics, physics-inspired validation strategies such as integrating evolutionary approaches with ML to map a generalized energy landscape,[60,83] or testing energy-volume curve beyond the training set[76] have recently drawn much attention.

Table I. Examples of AI-based material-property predictions for different types of materials.

The types of materials consist of (A) 3D inorganic crystalline solids, (B) stable 3D inorganic crystalline solids, (C) 2D materials/surfaces, (D) molecules, (E) 3D organic crystals, and (F) crystalline polymers.

The correlation-based ML models perform well in interpolation but poorly for extrapolation tasks. When combined with the non-differentiability of chemical spaces, it limits the application of classical ML in materials science. An alternative is offered physics-inspired ML, where the extrapolation and interpolation are performed along with manifolds corresponding to physically possible atomic configurations and satisfying basic physical laws and constraints. However, although there has been a lot of work in developing databases and feature vectors, coming up with strategies for physics-based ML models[87] still needs much detailed work. Additionally, the interpretability of a model can be vitally important from a scientific understanding perspective. This is motivated by a relatively new area in AI: so-called “Explainable AI.”[88] The explainability and the interpretability of the model mainly depend on the type of ML algorithm. For example, in Refs. [60,61] , feature importance plots revealed some of the important ML descriptors that guide the model. In models such as graph networks, elemental embeddings can reveal chemical similarity.[64,66]

Presently, ML models have been primarily used for screening of materials. This is because an ML model for a physical quantity allows to estimate such a quantity much faster than computing it. This allows one to probe a much larger space of materials than possible when performing actual calculations, and it is true for any type of computational methodology (DFT, phase field, and continuum modelling, for instance). Once ML has identified the sub-space of materials that likely have the desired property, then those, and only those, are probed using the computational technique of choice, DFT, for instance. In this sense, if traditional methods, such as DFT, have been used as a screening tool for experiments, then ML can act as the screening tool for DFT methods (standard DFT options as well as its hybrid-functional or higher-order corrections). Some of these material screening applications are drug discovery,[89] finding new binary compounds,[90] new perovskites,[91] full-Heusler compounds,[92] ternary oxides,[93] hard materials,[69] inorganic solid electrolytes,[94] high photo-conversion efficiency materials,[95] 2D materials,[60] and superconducting materials.[96] Some material science-related ML tools (GBML,[69] AFLOW-ML,[61] JARVIS-ML,[60] and OMDB,[70] for instance) allow web-based prediction of static properties to further accelerate material screening.

A second, major application of ML techniques to material science is in the realm of developing interatomic potentials, also known as force-fields, to simulate the dynamics of a system or to run Monte Carlo simulations. In this instance, ML is used to determine the parameters used in the phenomenological expression of the energy. Such expressions for the energy are then used to derive all other properties. Finding the right parameters (i.e., fitting the potential) is usually a computationally expensive task because of the very large configurational and parameter multi-dimensional spaces that need to be probed simultaneously while respecting all relevant physical constraints. Some of the atomistic potentials developed using ML are atomistic machine learning package (AMP),[75] physically informed artificial neural networks (PINNs),[76] Gaussian approximation potentials (GAPs),[77] Agni,[78] and spectral neighbor analysis potential (SNAP).[79] These potentials are shown to surpass conventional interatomic potentials both in accuracy and versatility.[97] These models are mainly developed for elemental solids, such as Ta, Mo, W, Al etc., or for a few binaries, such as Ni-Mo. Developing force fields for multicomponent systems is still limited due to an exponential increase in the number of ML parameters. However, unlike conventional fitting, these parameters can be optimized in a relatively more systematic way. Importantly, a standard force-field evaluation work-flow, like JARVIS-forcefield (JARVIS-FF),[49,50] still needs to be developed for such ML-based force-fields, to understand their generalizability. In fact, verification and validation of these ML-based models is a critical challenge of the field.

Combinatorial libraries and high-throughput experimentation

Complementing the theoretical libraries listed above requires experimental libraries that map structures, processing, and compositions to functionality. By now numerous outlets exist including Polymer Genome,[20] Citrination,[21] Dark Reactions,[98] Materials Data Facility,[99] and Materials Innovation Network.[22] This again needs to be accomplished at different length scales: microscopic, to better understand the links between microstructure or atomic configurations and macroscopic properties, as well as through macroscopic experiments that explore large regions of the composition space to rapidly map functional phase diagrams. The latter is made possible through high-throughput experimentation (HTE).

HTE and AI tools have been linked since HTE was re-discovered in the early 1990s. The origins of HTE can be traced back to the early 20th century with the discovery of the Haber-Bosch catalyst[100] and the Hanak multi-sample concept.[101] In both cases, the investigators realized that the search for new materials with outstanding properties and new mechanisms required a broader search through composition-processing-structure-property space than could be afforded by conventional one-sample-at-a-time techniques. As time automation and computational resources were limited so a liberal usage of “elbow grease” was required both for performing the experiments and data analysis. It took several decades for the publication of the landmark HTE paper by Xiang et al.,[102] and the ready availability of personal computers for this methodology to gain significant traction within the materials community. There have been a number of recent reviews[103106] on the topic, and today HTE is largely considered to be a mature field with significant efforts (and discoveries) spanning a large number of fields including catalysis,[107] dielectric materials,[104] and polymers.[108]

The creation and deployment of HTE workflows necessarily lead to a bottleneck centered around the need to interpret large (sometimes thousands) of materials data correlated in composition, processing, and microstructure from a single experiment.[109,110] By the early 2000s, a single HTE sample containing hundreds of individual samples could be made and measured for a range of characteristics within a week, but the subsequent knowledge extraction of composition, structure, properties of interest, and figure of merit often took weeks to months. There were several early international efforts to standardize data formats and create data analysis and interpretation tools for large-scale data sets.[111] These efforts touched on using AI to enable experimental planning[112,113] and data analysis and visualization.[114117]

An unexpectedly difficult exemplar for the field is the mapping of non-equilibrium phase maps through the collection of spectral data as a function of composition and processing, so-called “phase mapping.” A great deal of effort has been expended in working with computer scientists to better understand how to effectively correlate diffraction spectra of limited range to phase composition for a given sample. The problem is further exacerbated by peak shift due to alloying, the presence of non-equilibrium phases, and distortion of peak intensities due to the preferred orientation of crystallites (texturing). The overwhelming majority of this work has focused on using unsupervised techniques such as hierarchical clustering,[118] wavelet transformations,[119] non-negative matrix factorization,[120] and constraint programming paired with kernel methods.[121] Comparatively little work has been devoted to the use of supervised or semi-supervised techniques.[122,123] A recent review article is available for the interested reader.[124] Fully unsupervised techniques face challenges not only from noisy and limited range of experimental data, but also from highly non-linear scaling of the computational resources with a number of observations in the dataset. More recent work in the field has sought to impose locality (e.g., that neighboring compositions are likely to include the same phases) into creating the phase map through the use of segmentation techniques[125] or by attempting to deconvolve peak shift through the application of convolution non-negative matrix factorization.[126] A common theme for all of these efforts has been the importance of working on translating materials science problems into more general problems that are of interest to computer scientists. These new approaches appear to operate sufficiently rapidly as to permit on-the-fly analysis of diffraction data as it is being taken.[127]

Once knowledge extraction catches up to HTE synthesis, and characterization, the limit to rate of new materials discovery becomes that of decision making, i.e., what materials to pursue next given the knowledge of materials discovered so far (and processing conditions needed to make them). HTE groups have long worked with theoreticians to identify interesting materials to pursue.[128130] More recently, in an effort to decrease the turnover time, several HTE groups have turned to the use of AI for hypothesis or lead generation.[96,131,132] One example of such an AI platform is the Materials-Agnostic Platform for Informatics and Exploration developed at Northwestern, which transforms compositional data into a set of chemical descriptors that can be used to train an ML model that targets a particular property such as the band gap or an alloy's metallic glass forming capability.[53] One additional benefit of HTE experiments is that they produce negative and positive results simultaneously without any additional cost. Thus, the models can use both negative and positive results from HTE experiments to produce less-biased models than those based on traditional material discovery campaigns.

A recent example illustrated the power of combining HTE with ML models by demonstrating a nearly 1000× acceleration in the rate of the discovery of novel amorphous alloys.[132] Amorphous alloys are a particularly apt system to be predicted by ML, as traditional computational approaches like DFT are not particularly effective. From this study several interesting new phenomena were observed. The most notable of which was that the formation of amorphous alloys via physical vapor deposition was more strongly correlated with the presence of complex ordered intermetallic structures than on the traditional presence of deep eutectics. Moreover, the predictions of stability can be coupled with predictions of physical properties (e.g., modulus) and can then be used to guide the discovery of novel high modulus metallic glasses as in Fig. 2.

Figure 2. Illustrating the glass-forming ability of a novel Co-V-Zr alloy (left) and its predicted elastic modulus (right).

More recently, the pairing of supervised learning with active learning[133137]—the ML implementation of optimal experiment design—has been used to address the dual challenges of hypothesis generation and testing. First, a supervised learning method is selected, one that provides uncertainty quantification along with prediction estimates. The output estimate and uncertainty are then exploited by active learning to identify the next experiment to perform that will most rapidly optimize a given objective, e.g., hone in on a material that maximizes or minimizes a functional property. Bayesian optimization, the subset of active learning methods focused on local function optimization, has been used by a number of groups to accelerate the discovery of advanced materials. In these projects, ML identifies the material synthesis and fabrication parameter values to investigate next. These values are then used to guide experimentalists in the synthesis and characterization, and the resulting data is fed back into the ML model to select the subsequent experiment. Accelerated materials discovery has been demonstrated for low thermal hysteresis shape memory alloys,[87] piezoelectrics with high piezoelectric coupling coefficients,[133] and high-temperature superconductors.[138]

Advising systems were the stepping stone to the next level of HTE—autonomous systems,[139] where ML is placed in control of the full experiment cycle through direct interfacing with material synthesis and characterization tools. Rather than using a pre-defined grid over which to explore, it would be beneficial to explore the materials space in a more informed manner. Autonomous systems hold great potential, not just in accelerating the experimental cycle by reducing laborious tasks, but also by potentially reducing the amount of prior knowledge and expertise required in synthesis, characterization, and data analysis. Autonomous Research System is such a system, capable of optimizing carbon nanotube growth parameters.[140] ChemOS is another such system, capable of exploring chemical mixtures to achieve a desired optical spectra.[141] These systems seek to find the material which optimizes some given properties—a challenge of local optimization. Autonomous systems can also be used for global optimization challenges, e.g., to maximize knowledge gained from a sequence of experiments, as demonstrated by a set of systems capable of autonomous determination of non-equilibrium phase maps across composition and temperature space.[142,143] A similar fusion in chemistry may be the merging of chemical robotics systems[144] with reaction network models such as CHematica.[145]

Significant challenges remain before autonomous systems become commonplace. One key challenge is the integration of uncertainty from data collection through ML predictions and experiment design. Additionally, many application areas have a wealth of knowledge stored in the literature which can be exploited to accelerate materials exploration and optimization. Extracting this knowledge and making it searchable is another key challenge. Furthermore, researchers are investigating methods for incorporating prior knowledge of materials physics into ML frameworks to ensure that predictions are physically realizable. Physical research systems are also susceptible to multiple modes of failure resulting in anomalous data. Anomaly detection and mitigation are thus also required. Integration of physical synthesis and characterization instruments into autonomous platforms are currently restricted by disparate communication protocols and a lack of scriptable interfaces. Accordingly, there is also a need for a data and software platform capable of managing and incorporating diverse data types and communication protocols.

Local structure libraries and functional imaging

The combinatorial libraries above allow rapid scanning of the compositional space. However, for many materials of interest, responses are highly inhomogeneous, for example in materials such as manganites, filamentary superconductors, relaxor ferroelectrics, and multiferroic oxides. Due to strong correlations and competing orders, the local atomic and mesoscopic structures, distribution and type of defects, and their dynamics are all critically linked to the functionality of these disordered materials. Furthermore, for progress to be made on both understanding the driving forces for their functions, as well as to optimize them for applications, libraries of local atomic-scale structures and ordering are required to complement the macroscopic libraries generated through traditional HTE. It should be noted that local imaging studies can provide more evidence for the structure–property relationships that are of importance. Below, we review some advances in how libraries of atomic-scale defects can be generated using a DL approach,[24] as well as advances in functional imaging that enable high-throughput local characterization.

Libraries of local structures

Perhaps, the most important and least available (at this point) libraries are of atomic-scale structures (configurations) and defects, even in commonly studied materials such as graphene or other 2D materials. This is in comparison to, for instance, libraries of microstructures of alloys, which have been available for years.[146] From the statistical physics perspective, access to these microstates should, in principle, enable predictions to be made of the system's properties as the thermodynamic conditions are varied. Practically, atomic-scale imaging has only become widespread and near routine over the past decade, due in large part due to the proliferation of aberration-corrected scanning transmission electron microscopy. Nonetheless, even if atomic-scale images are acquired, it is still difficult to manually identify the atomic configurations and classify the types of defects. Indeed, most of the existing “classical” methods of analyzing microscopy data are slow, inefficient, and require frequent manual input. Recently, it was demonstrated that deep neural networks[14] (also known as DL) can be trained to perform fast and automated identification of atomic/molecular type and position as well as to spot point-like structural irregularities (atomic defects) in the lattice in static and dynamic scanning transmission electron and scanning tunneling microscopy (STEM and STM) data with varying levels of noise.[33,147,148] The DL approach, and, more generally, ML, allows one to generalize from the available labeled images (training set) to make accurate image-level and/or pixel level classification of previously unseen data samples. The training data may come from theoretical simulations, such as a Multislice algorithm[149] for electron microscopy or from a (semi-)manual labelling of experimental images by or under supervision of domain experts.

Fully convolutional neural networks (FCNN),[150] which are trained to output a pixel-wise classification maps showing a probability of each pixel in the input image belonging to a certain type of atom and/or atomic defect were shown to be well suited for the analysis of atomically resolved experimental images. Ziatdinov et al.[147] demonstrated that FCNN trained on simulated STEM data of graphene can accurately identify atoms and certain atomic defects in noisy experimental STEM data from a graphene monolayer, including identification of atoms in the regions of the lattice with topological reconstructions that were not a part of the training set. Indeed, these models are eminently transferable. For example, a model based on graphene can perform well on other 2D materials with a similar structure, usually without any need for further training. This is particularly important when generating libraries, as continual model training on every system would impede rapid progress.

Furthermore, for the quasi-2D atomic systems, the FCNN output can be mapped onto a graph representing the classical chemical bonding picture, which allows making a transition from classification based on image pixels to classification based on specific chemistry parameters of atomic defects such as bond lengths and bond angles. In such a graph representation, the graph nodes represent the FCNN-predicted atoms of different type, while the graph edges represent bonds between atoms and are constructed using known chemistry constraints, including maximum and minimum allowed bond length between the corresponding pairs of atoms. This FCNN-graphs approach was applied to the analysis of experimental STEM data from a monolayer graphene with Si impurities allowing construction of a library of Si-C atomic defect complexes.[151] The FCNNs can also aid studies of solid-state reactions on the atomic level observed in dynamic STEM experiments.[152] In this case, an FCNN is used in combination with a Gaussian mixture model to extract atomic coordinates and trajectories, and to create a library of the structural descriptors from noisy experimental STEM movies. The associated transition probabilities are then analyzed via a Markov approach to gain insight into the atomistic mechanisms of beam-induced transformations. This was demonstrated for transition probabilities associated with coupling between Mo substitutions and S vacancies in WS2 [152] and between different Si-C configurations at the edge and in the bulk of graphene.[153]

While learning the structural properties of atomic defects in materials at the atomic scale is important by itself, it is also critical to understand how the observed structural peculiarities affect electronic and magnetic functionalities at the nanoscale. From the experimental point of view, this requires us to be able to perform both structural (STEM) and functional imaging (STM in the case of electronic properties) on the same sample. Then the goal is to identify the same atomic structures and defects from STEM and STM experiments and to correlate the observed structural properties to measured electronic properties, namely, local density of electronic states at/around the structure of interests. This was recently demonstrated[151] via a combined experimental–theoretical approach, where the atomic defects identified via DL in STEM structural imaging on graphene with Si dopants were then identified by their DFT-calculated electronic fingerprints in the STM measurements of local electronic density of states on the same sample. This work, summarized in Fig. 3, shows a realistic path toward the creation of comprehensive libraries of structure–property relationships of atomic defects based on experimental observations from multiple atomically resolved probes. Such libraries can significantly aid the future theoretical calculations by confining the region of the chemical space that needed to be explored, i.e., by focusing the effort on the experimentally observed atomic defect structures instead of all those that are possible in principle.

Figure 3. Creating local imaging libraries. (a) STEM imaging of Si impurities in graphene monolayer. (b) Categorization of defects in (a) based on the number/type of chemical species in their first coordination sphere via a DL-based approach. (c) The extracted 2D atomic coordinates of these defects are then used as an input into DFT calculations to obtain a fully relaxed 3D structure and calculate electronic properties (in this case, the local density of electronic states for the bands below (E V) and above (E C) the Fermi level). (d) The DFT-calculated data can be then used to search for the specific type of defects in the STM data from the same sample, which measures the local density of states. The search can be performed manually (if the number of STM images is small) or automatically by training a new ML classifier for categorizing the STM data. Image adapted from Ziatdinov et al.[151]

The current challenges include improvement of infrastructure for cross-platform measurements (sample transfer, automated location of the same nanoscale regions on different platforms) as well as absence of a standard data format for storing and processing these libraries, which is accepted and used by the entire community. There is also the need to collate data across existing platforms, and thus searchability to find the relevant data is another major issue that will need to be addressed.

Functional libraries facilitated with rapid functional imaging

A similar argument can be made for the need for functional property libraries derived from local measurements. Due to the varying local structure in disordered materials, this requires the mesoscopic functionalities to be mapped across the sample, which then facilitates learning the microstructural features that are associated with the observed response. Multiple examples of the imaging techniques that can be applied for these applications are versions of scanning probe microscopy (SPM) for mapping elastic and electromechanical properties,[154] chemical imaging via microRaman[155] and time of flight secondary ion mass spectroscopy,[156] and nano x-ray methods.[157,158] Critical for these applications becomes the issues of physics-based data curation, i.e., the transition from the measured microscope signal to material-specific information. In certain techniques such as piezoresponse force microscopy (PFM), the measured signal is fundamentally quantitative, and with the proper calibrating of the measurement signal can be used as a material-specific descriptor.[159,160] In other techniques such as SPM-based elastic measurements or STM, the measured signal is a linear or non-linear convolution of the probe and material's properties, and quantification of materials' behaviors represents a significantly more complex problem.[161] Similarly of interest is the combination of information from multiple sources, realized in multimodal imaging. Here, once the data is converted from microscope-dependent to material dependent, and multiple information sources are spatially aligned, the joint data sets can be mined to extract structure–property relationships.[162164]

However, performing experiments is time and labor-intensive, and more automated methods of exploring the space and recognizing important areas (such as extended defects or domain junctions) are necessary. For reducing the labor-intensive portion, ML has been shown to be of substantial utility. For instance, in SPM, an ML-utilizing workflow for a bacterial classification task was originally proposed by Nikiforov et al.[4] There, the authors used the measured PFM signal and trained a neural network to enable automatic recognition of bacteria classes, as distinguished by their electromechanical (i.e., PFM) response. Beyond simple classification tasks, the ML methods in SPM have also been useful to extract fitting parameters from noisy hysteresis loop data,[165] to enable better functional fits,[166] and for phase diagram generation.[167,168] These tools greatly reduce the labor component of acquiring functional imaging, although much work remains.

Still, despite the increasing speed and utility of ML methods in this space, much of the local functional property measurements are inherently time-intensive. For example, traditional spectroscopic methods in SPM, even for seemingly straightforward properties such as the local electrical resistance R(V) where V is the applied voltage to the probe or the electric-field induced strain S(E) where E is the applied electric field, can take several hours to acquire with conventional atomic force microscopy methods. How can one gain efficiency in this step? One method is to instead collect low-resolution datasets and attempt to reconstruct the high-resolution version with data-fusion methods.[169] Recently, large efficiency gains were made via the use of the so-called “General mode” (G-Mode) platform[170] in a range of functional imaging by SPM methods. The success of this approach lies largely in the simplicity. The G-mode platform is built on the foundation of complete data acquisition from available sensors, filtering the data via ML or statistical methods, and subsequent analysis to extract the relevant material parameters. It has since been applied to a raft of SPM modes including current–voltage (IV) curve acquisition,[171] PFM[170] and spectroscopy,[172] and Kelvin Probe force microscopy.[173]

Consider also the acquisition of local hysteresis loops in ferroelectric materials, typically accomplished via piezoresponse force spectroscopy. Fundamental ferroelectric switching is extremely fast (≈ GHz), and photodetectors can easily operate at ≈4–10 MHz, but heterodyne detection methods average data over time, leading to captures at much lower rates, and typically acquiring one hysteresis loop per second. The reason is that detection and excitation are decoupled, and each excitation is followed by a long (few ms) wave packet for the detection. This problem can be circumvented by using a dynamic mode, where the deflection of the cantilever is continually monitored and stored as a large excitation is applied at a rapid rate (e.g., 10 kHz) to the tip (see Fig. 4(a)). If the voltage applied locally exceeds the local coercive voltage of the ferroelectric material, then polarization switching occurs, leading to switching at the excitation frequency. Reconstruction of the signal via signal filtering methods enables generation of the hysteresis loop, as shown in Fig. 4(b). This technique enables acquisition of hysteresis loops while scanning and ultimately, in a ~1000× increase in throughput, in addition to providing much more statistics on the process.

Figure 4. (a) General-mode acquisition (G-mode) differs from a standard measurement, in that the raw data is stored without pre-processing such as use of a lock-in amplifier. (b) The raw response of the cantilever deflection signal is Fourier transformed and processed using a band-pass filter and a user-defined noise floor threshold. The cleaned data is then transformed back to the time domain to reconstruct the hysteresis loops. Since many hysteresis loops are captured, the data is better represented as a 2D histogram (bottom right). This enables rapid mapping of relevant material parameters, such as the coercive voltage. This can, in principle, be stored along with global (macroscopic) characterization to populate libraries of materials behavior. Figure is adapted from Somnath et al.[172]

As another point, consider the situation for obtaining the resistance R(V) spectra from a single point on a sample in typical STM or atomic force microscopy. Traditionally, the waveform applied to the tip (or sample) is stepped, and a delay time is added after each step to minimize the parasitic capacitance contribution to the measured current. This scheme is shown in Fig. 5(a). This is remarkably effective, but also dramatically limits the acquisition speed to ≈1–2 curves per second for realistic instrument bias waveforms. However, current amplifiers that are used can operate at several kHz without hindrance, suggesting that the fundamental limits lie much higher. Indeed, if one captures an IV curve by applying a sine wave at several hundred Hz (and measures the raw current from the amplifier at the full bandwidth), it is possible to obtain IV traces that, although beset with a capacitance contribution, still contain the relevant information. Given that the circuit can be modeled, Bayesian inference can then be used to determine the capacitance contribution and provide the reconstructed resistance curves as a function of voltage, with uncertainty as shown in Fig. 5(b). The reconstructed traces can then be analyzed further, for example to gauge the disorder in the polarization switching within each capacitor (as in Fig. 5(c)), or to analyze the local capacitance contribution. The advantage of this method is not only that it enables functional imaging of electrical properties at hundreds of times the current state of the art; but it also allows to do so with greater spectral and spatial resolution.

Figure 5. G-IV for rapid mapping of local electronic conductance. (a) Typical I–V measurements on SPM platforms utilize a regimen where after the voltage is stepped to the new value, a delay time is introduced before the current is averaged, as shown in the inset. On the other hand, the G-IV mode utilizes sinusoidal excitation at high frequency (200 Hz in this case), with results shown for a single point on a ferroelectric PbZr0.2Ti0.8O3 nanocapacitor in (b). The raw current (I meas), the reconstructed current (I rec) given the resistance–capacitance circuit model, and the inferred current without the capacitance contribution (I Bayes) are plotted. This method also allows the uncertainty in the inferred resistance traces to be determined, as shown in the respective plots of R(V) with the standard deviation shaded. White space indicates areas where the resistance is too high to be accurately determined. Reconstructing the current after the measurement can facilitate rapid mapping of switching disorder in the nanocapacitors, with the computed parameter for disorder mapped in (c). Figure is adapted from Somnath et al.[171]

Reiterating, the idea in these experiments is to produce libraries of functionality that can be used synergistically with libraries of the atomic or mesoscopic structures. One can imagine, for example, libraries of defects in 2D materials with corresponding functional property mapping of the opto-electronic properties of the same materials. The challenge is that many of the techniques for functionality mapping with scanning probe are also not necessarily amenable to high speed and require substantial calibration efforts (e.g., to obtain quantitative maps of the converse piezoelectric coefficient[174]), but those need either advances in instrumentation[175] or automated characterization systems, if large-scale libraries with local functional properties are to be built.

Another major challenge which arises in the formation of these libraries is the choice of format. This is a major topic that is not a portion of this prospective, but which is undoubtedly important and needs mentioning. We envision that the most likely scenario is multiple databases specialized around the specific type of data being housed, e.g., theory calculations, crystallography, mechanical properties, imaging studies, and so forth. Regardless, in all cases we note that it is important to have open, well-documented, and standardized data models, to enable better integration.

From libraries to integrated knowledge

Integration of the experiments and simulations across scales is obviously not a simple endeavor, and no universal solution is likely. Numerous efforts have been made in this regard, including for example the very extensive work on microstructural modeling and optimization,[176178] as well as efforts to combine theory and experiment to rationally design new polymers with specific functional properties.[179] One can also combine information from multiple sources within a Bayesian framework, to guide experimental design and reduce the time (number of experimental or simulation iterations) to arrive at an optimal result (under uncertainty).[134,136,180] These methods typically use some objectives based on a desired property of interest. Methods such as transfer learning[181] can be useful to combine computational and experimental data, when the data is scarce. Similarly, augmentation can be a useful strategy, as has recently been show for x-ray datasets.[182]

There is also an alternative view, which is to consider that structure defines all properties, and that imaging and macroscopic experiments can be combined to constrain generative models based on statistical physics. The key to this pathway lies in theory-experiment matching, which should be done in a way that respects the local statistical fluctuations, which contain information about the system's response to external perturbations. Recently, we have formulated a statistical distance framework that enables this task.[183185] The optimization drives the model to produce data statistically indistinguishable from the experiment, taking into account the inherent sampling uncertainty. The resulting model then allows predicting behavior beyond the measured thermodynamic conditions.

For example, consider a material of a certain composition that has been characterized macroscopically, so that its composition and crystal structure are known. If atomically resolved imaging data is available, then the next step becomes to identify the atomic configurations present, i.e., practically the position and chemical identity of the surface atoms. Chemical identification of atomic elements in STM images can be complicated, but first principles calculations can help guide the classification. Deep convolutional neural networks trained on simulated images from the DFT can then be run on the experimentally observed images to perform the automated atomic identification. From here, local crystallography[186] tools can be used to map the local distortions present, and to determine the configurations of nearest and next-nearest neighbors (and higher if need be) of each atom in the image, to produce an atomic configurations histogram. This can then be used to constrain a simple statistical mechanical model (e.g., lattice model) with easily interpretable parameters in the form of effective interaction energies (note this can also be guided by first principles theory). The histograms produced from experiment and theory can be computed and the model can be optimized via minimization of the statistical distance (see Fig. 6(a)) between the histograms. As an example, this concept has recently been used to explore phase separation in an FeSe0.45Te0.55 superconductor, for which atomically resolved imaging was available. The image is shown in Fig. 6(b), with red (Te) and blue (Se) atoms determined via a threshold that would preserve the nominal composition of the sample. A simple lattice model that considered the interactions between the Te and Se atoms was set up and optimized based on the statistical distance minimization approach. As can be seen in the histograms of atomic configurations in Fig. 6(d), the model closely matches the observed statistics from the experiment. This optimized model can then be used to sample configurations at different temperatures, as shown in Fig. 6(e).

Figure 6. Statistical distance framework. (a) Statistical distance between a model P and a target Q is defined as a distance in probability space of the local configurations. This metric enables the estimation of the ability to distinguish samples arising from thermodynamic systems (under equilibrium considerations). (b) STM image of FeSe0.45Te0.55 system with Se atoms (dark contrast) and Te atoms (bright contrast). The structure of the unit cell is shown in (c). (d) Atomic configurations histogram from both the data and the optimized model in blue and teal colors, as well as from a model that has no interactions (i.e., is random) plotted in red. Once the generative model is optimized, it can be run sampled for different temperatures, as in (e). Note that reduced T units are utilized. Reprinted (adapted) with permission from Vlcek et al.[185] Copyright (2017) American Chemical Society.

It is important here to highlight the key points of this approach. The main idea is that by knowing the atomic configurations, we can learn the underpinning physics as these configurations present a detailed probe into the system's partition function. This can be compared with e.g., time-based spectroscopies, where observations of fluctuations enable mapping the full potential energy surface, as has been done for biomolecules.[187] Here, instead of dealing with fluctuations in time, we observe the spatial fluctuations that are quenched within the solid. At the same time, given that the models are physics-based, they are generalizable and should be predictive, thus enabling extrapolation rather than simply interpolation. This may be especially useful for systems where the order parameter is not easily defined, such as relaxors,[188] where the goal would be to determine how the statistics of atomic configurations (in particular, the relevant distortions) evolve through phase transitions. The combination of local structure and functional information, macroscopic characterization, and first principles theory can therefore be used within this framework to integrate our knowledge and build predictive models that can guide materials discovery and experimental design.

Challenges remain in the areas of uncertainty quantification (how reliable are the predictions as the thermodynamic conditions diverge from those in experiment), as well as how best to choose the appropriate complexity of the model. Moreover, there are challenges associated with non-equilibrium systems that need to be addressed. Practically, there is also much difficulty in actually determining where to retrieve the necessary data, given that it is likely to be strewn across multiple databases. Ideally, these models could be incorporated at the experimental site (e.g., at the microscope) for enabling real-time predictions of sample properties, and guiding the experimenter to maximize information gain, thereby creating efficiencies, whilst automatically adding to the available library. However, this is still a work in progress.

Community response

Finally, it is worth mentioning that the vision laid out in this prospective requires efforts of individuals, groups, and the wider materials community to be successful. Whilst in principle this is no different to the incremental, community-driven progress that has characterized modern science in decades past, there are distinct challenges that deserve attention. One aspect is the sharing of codes and datasets through online repositories, which should be encouraged. Creating curated datasets and well-documented codes takes time, and this should be recognized via appropriate incentives. Sharing codes can be done via use of tools such as Jupyter notebooks run on the cloud. Ensuring that data formats within individual laboratories and organizations are open, documented, and standardized requires much work, but pays off in terms of efficiency gains in the long term. Toward this aim, a subset of the authors has created the universal spectral imaging data model (USID[189]), while the crystallography community is well versed with the Crystallographic Information File (CIF) format.[190] Logging the correct meta-data with each experiment is critical, and lab notebooks can be digitized to enable searchability and indexing. Perhaps most importantly, teaching and educating the next generation of scientists to be well versed in data, in addition to ML, is essential.


The methods outlined in this prospective offer the potential to accelerate materials development via an integrated approach combining HT computation and experimentation, imaging libraries, and statistical physics-based modeling. In the future, autonomous systems that can utilize this knowledge and perform on the fly optimization (e.g., using reinforcement learning) may become feasible. This would result in ever increasing sizes of the libraries, but also more efficient search and optimization. But perhaps less acknowledged is that given the large libraries that are expected to be built, the chance to learn causal laws[191] from this data becomes a reality. Indeed, this is likely to be easier in the case of physics or materials science than in other domains due to the availability of models. In all cases, the availability of such databases and coupling with theoretical and ML methods offers the potential to substantially alter how materials science is approached.


The work was supported by the U.S. Department of Energy, Office of Science, Materials Sciences and Engineering Division (R.K.V. and S.V.K.). A portion of this research was conducted at and supported (M.Z.) by the Center for Nanophase Materials Sciences, which is a US DOE Office of Science User Facility.


1.Agrawal, A. and Choudhary, A.: Perspective: Materials informatics and big data: realization of the ‘fourth paradigm’ of science in materials science. APL Mater. 4, 053208 (2016).
2.Gakh, A.A., Gakh, E.G., Sumpter, B.G., and Noid, D.W.: Neural network-graph theory approach to the prediction of the physical properties of organic compounds. J. Chem. Inf. Comput. Sci. 34, 832 (1994).
3.Sumpter, B.G., Getino, C., and Noid, D.W.: Neural network predictions of energy transfer in macromolecules. J. Phys. Chem. 96, 2761 (1992).
4.Nikiforov, M., Reukov, V., Thompson, G., Vertegel, A., Guo, S., Kalinin, S., and Jesse, S.: Functional recognition imaging using artificial neural networks: applications to rapid cellular identification via broadband electromechanical response. Nanotechnology 20, 405708 (2009).
5.Currie, K.R. and LeClair, S.R.: Self-improving process control for molecular beam epitaxy. Int. J. Adv. Manuf. Technol. 8, 244 (1993).
6.Bensaoula, A., Malki, H.A., and Kwari, A.M.: The use of multilayer neural networks in material synthesis. IEEE Trans. Semiconduct. Manuf. 11, 421 (1998).
7.Lee, K.K., Brown, T., Dagnall, G., Bicknell-Tassius, R., Brown, A., and May, G.S.: Using neural networks to construct models of the molecular beam epitaxy process. IEEE Trans. Semiconduct. Manuf. 13, 34 (2000).
8.Takeuchi, I., Koinuma, H., Amis, E.J., Newsam, J.M., Wille, L.T., and Buelens, C.: SYMPOSIUM S: Combinatorial and artificial intelligence methods in materials science. Mater. Res. Soc. Symp. Proc 700, 358371 (2002).
9.Bohannon, J.: Fears of an AI pioneer. Science 349, 252 (2015).
10.Sejnowski, T.J.: The Deep Learning Revolution (MIT Press, Cambridge, MA, 2018).
11.McCarthy, J., Minsky, M.L., Rochester, N., and Shannon, C.E.: A proposal for the Dartmouth summer research project on artificial intelligence, August 31, 1955. AI Magazine 27, 12 (2006).
12.LeCun, Y.: A theoretical framework for back-propagation. In Proceedings of the 1988 Connectionist Models Summer School, edited by Touresky, D., Hinton, G., and Sejnowski, T. (Morgan Kaufmann, CMU, Pittsburgh, PA, 1988) p. 21.
13.Boser, B.E., Guyon, I.M., and Vapnik, V.N.: A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory; ACM, Pittsburgh, PA, USA, 1992; p. 144.
14.LeCun, Y., Bengio, Y., and Hinton, G.: Deep learning. Nature 521, 436 (2015).
15.Brodtkorb, A.R., Hagen, T.R., and Sætra, M.L.: Graphics processing unit (GPU) programming strategies and trends in GPU computing. J. Parallel Distrib. Comput. 73, 4 (2013).
16.Rupp, K.: 42 Years of Microprocessor Trend Data, 2018. (accessed July 17, 2019). Pablo, J.J., Jones, B., Kovacs, C.L., Ozolins, V., and Ramirez, A.P.: The materials genome initiative, the interplay of experiment, theory and computation. Curr. Opin. Solid State Mater. Sci. 18, 99 (2014).
18.Curtarolo, S., Setyawan, W., Wang, S., Xue, J., Yang, K., Taylor, R.H., Nelson, L.J., Hart, G.L., Sanvito, S., and Buongiorno-Nardelli, M.: AFLOWLIB. ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227 (2012).
19.Choudhary, K.: Jarvis-DFT, 2014. (accessed July 17, 2019).
20.Kim, C., Chandrasekaran, A., Huan, T.D., Das, D., and Ramprasad, R.: Polymer genome: a data-powered polymer informatics platform for property predictions. J. Phys. Chem. C 122, 17575 (2018).
21.C. Informatics: Open Citrination Platform. (accessed July 17, 2019).
22.Georgia Institute of Technology: Institute for Materials: Materials Innovation Network, 2019. (accessed July 17, 2019).
23.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., and Dubourg, V.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825 (2011).
24.Kalinin, S.V., Sumpter, B.G., and Archibald, R.K.: Big-deep-smart data in imaging for guiding materials design. Nat. Mater 14, 973 (2015).
25.Kusiak, A.: Smart manufacturing must embrace big data. Nat. News 544, 23 (2017).
26.Bonnet, N.: Artificial intelligence and pattern recognition techniques in microscope image processing and analysis. In Advances in Imaging and Electron Physics, edited by Hawkes, P.W. (Elsevier, San Diego, CA, 2000), pp. 1.
27.Nyshadham, C., Oses, C., Hansen, J.E., Takeuchi, I., Curtarolo, S., and Hart, G.L.: A computational high-throughput search for new ternary superalloys. Acta Mater. 122, 438 (2017).
28.Isayev, O., Fourches, D., Muratov, E.N., Oses, C., Rasch, K., Tropsha, A., and Curtarolo, S.: Materials cartography: representing and mining materials space using structural and electronic fingerprints. Chem. Mater. 27, 735 (2015). Pablo, J.J., Jackson, N.E., Webb, M.A., Chen, L.-Q., Moore, J.E., Morgan, D., Jacobs, R., Pollock, T., Schlom, D.G., Toberer, E.S., Analytis, J., Dabo, I., DeLongchamp, D.M., Fiete, G.A., Grason, G.M., Hautier, G., Mo, Y., Rajan, K., Reed, E.J., Rodriguez, E., Stevanovic, V., Suntivich, J., Thornton, K., and Zhao, J.-C.: New frontiers for the materials genome initiative. npj Comput. Mater. 5, 41 (2019).
30.Adams, B.L., Kalidindi, S., and Fullwood, D.T.: Microstructure Sensitive Design for Performance Optimization (Butterworth-Heinemann, Oxford, UK, 2012).
31.Huan, T.D., Mannodi-Kanakkithodi, A., Kim, C., Sharma, V., Pilania, G., and Ramprasad, R.: A polymer dataset for accelerated property prediction and design. Sci. Data 3, 160012 (2016).
32.Ziatdinov, M., Jesse, S., Vasudevan, R.K., Sumpter, B.G., Kalinin, S.V., and Dyck, O.: Tracking atomic structure evolution during directed electron beam induced Si-atom motion in graphene via deep machine learning. (2018) arXiv preprint arXiv:1809.04785.
33.Madsen, J., Liu, P., Kling, J., Wagner, J.B., Hansen, T.W., Winther, O., and Schiøtz, J.: A deep learning approach to identify local structures in atomic-resolution transmission electron microscopy images. Adv. Theory Simul. 1 (2018).
34.Kang, B. and Ceder, G.: Battery materials for ultrafast charging and discharging. Nature 458, 190 (2009).
35.Richards, W.D., Miara, L.J., Wang, Y., Kim, J.C., and Ceder, G.: Interface stability in solid-state batteries. Chem. Mater. 28, 266 (2015).
36.Kirklin, S., Saal, J.E., Hegde, V.I., and Wolverton, C.: High-throughput computational search for strengthening precipitates in alloys. Acta Mater. 102, 125 (2016).
37.Mounet, N., Gibertini, M., Schwaller, P., Campi, D., Merkys, A., Marrazzo, A., Sohier, T., Castelli, I.E., Cepellotti, A., and Pizzi, G.: Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds. Nat. Nanotechnol. 13, 246 (2018).
38.Choudhary, K., Kalish, I., Beams, R., and Tavazza, F.: High-throughput identification and characterization of two-dimensional materials using density functional theory. Sci. Rep. 7, 5179 (2017).
39.Mo, Y., Ong, S.P., and Ceder, G.: Insights into diffusion mechanisms in P2 layered oxide materials by first-principles calculations. Chem. Mater. 26, 5208 (2014).
40.Beams, R., Cançado, L.G., Krylyuk, S., Kalish, I., Kalanyan, B., Singh, A.K., Choudhary, K., Bruma, A., Vora, P.M., and Tavazza, F.A.N.: Characterization of Few-layer 1T′ MoTe2 by polarization-resolved second harmonic generation and Raman scattering. ACS Nano 10, 9626 (2016).
41.Sholl, D. and Steckel, J.A.: Density Functional Theory: A Practical introduction (John Wiley & Sons, Hoboken, NJ, 2011).
42.Jain, A., Ong, S.P., Hautier, G., Chen, W., Richards, W.D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., and Ceder, G.: Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
43.Kirklin, S., Saal, J.E., Meredig, B., Thompson, A., Doak, J.W., Aykol, M., Rühl, S., and Wolverton, C.: The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, 15010 (2015).
44.Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N., and Kozinsky, B.: AiiDA: automated interactive infrastructure and database for computational science. Comput. Mater. Sci. 111, 218 (2016).
45.Choudhary, K., Cheon, G., Reed, E., and Tavazza, F.: Elastic properties of bulk and low-dimensional materials using van der Waals density functional. Phys. Rev. B 98, 014107 (2018).
46.Geilhufe, R.M., Olsthoorn, B., Ferella, A., Koski, T., Kahlhoefer, F., Conrad, J., and Balatsky, A.V.: Materials informatics for dark matter detection. (2018) arXiv preprint arXiv:06040.
47.Ramakrishnan, R., Dral, P.O., Rupp, M., and Von Lilienfeld, O.A.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
48.Allen, M.P. and Tildesley, D.J.: Computer Simulation of Liquids (Oxford University Press, New York, 2017).
49.Choudhary, K., Biacchi, A.J., Ghosh, S., Hale, L., Walker, A.R.H., and Tavazza, F.: High-throughput assessment of vacancy formation and surface energies of materials using classical force-fields. J. Phys. Condens. Matter 30, 395901 (2018).
50.Choudhary, K., Congo, F.Y.P., Liang, T., Becker, C., Hennig, R.G., and Tavazza, F.: Evaluation and comparison of classical interatomic potentials through a user-friendly interactive web-interface. Sci. Data 4, 160125 (2017).
51.Ogata, S., Lidorikis, E., Shimojo, F., Nakano, A., Vashishta, P., and Kalia, R.K.: Hybrid finite-element/molecular-dynamics/electronic-density-functional approach to materials simulations on parallel computers. Comput. Phys. Commun. 138, 143 (2001).
52.Butler, K.T., Davies, D.W., Cartwright, H., Isayev, O., and Walsh, A.: Machine learning for molecular and materials science. Nature 559, 547 (2018).
53.Ward, L., Agrawal, A., Choudhary, A., and Wolverton, C.: A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2, 16028 (2016).
54.Rupp, M., Tkatchenko, A., Müller, K.-R., and Von Lilienfeld, O.A.: Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
55.Faber, F., Lindmaa, A., Lilienfeld, O.A.V., and Armiento, R.: Crystal structure representations for machine learning models of formation energies. Int. J. Quantum Chem. 115, 1094 (2015).
56.Schütt, K., Glawe, H., Brockherde, F., Sanna, A., Müller, K., and Gross, E.: How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B 89, 205118 (2014).
57.Ward, L., Liu, R., Krishna, A., Hegde, V.I., Agrawal, A., Choudhary, A., and Wolverton, C.: Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B 96, 024104 (2017).
58.Bartók, A.P., Kondor, R., and Csányi, G.: On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
59.Faber, F.A., Hutchison, L., Huang, B., Gilmer, J., Schoenholz, S.S., Dahl, G.E., Vinyals, O., Kearnes, S., Riley, P.F., and von Lilienfeld, O.A.: Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255 (2017).
60.Choudhary, K., DeCost, B., and Tavazza, F.: Machine learning with force-field inspired descriptors for materials: fast screening and mapping energy landscape. (2018) arXiv preprint arXiv:07325.
61.Isayev, O., Oses, C., Toher, C., Gossett, E., Curtarolo, S., and Tropsha, A.: Universal fragment descriptors for predicting properties of inorganic crystals. Nat. Commun. 8, 15679 (2017).
62.Kearnes, S., McCloskey, K., Berndl, M., Pande, V., and Riley, P.: Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595 (2016).
63.Schütt, K., Kindermans, P.-J., Felix, H.E.S., Chmiela, S., Tkatchenko, A., and Müller, K.-R.: SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. In Advances in Neural Information Processing Systems; 2017; p. 991.
64.Xie, T. and Grossman, J.C.: Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
65.Schütt, K.T., Sauceda, H.E., Kindermans, P.-J., Tkatchenko, A., and Müller, K.-R.: Schnet—a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
66.Chen, C., Ye, W., Zuo, Y., Zheng, C., and Ong, S.P.: Graph networks as a universal machine learning framework for molecules and crystals. (2018) arXiv preprint arXiv:05055.
67.Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., and Dahl, G.E.: Neural message passing for quantum chemistry. (2017) arXiv preprint arXiv:01212.
68.Ward, L., Dunn, A., Faghaninia, A., Zimmermann, N.E., Bajaj, S., Wang, Q., Montoya, J., Chen, J., Bystrom, K., and Dylla, M.: Matminer: an open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60 (2018).
69.De Jong, M., Chen, W., Notestine, R., Persson, K., Ceder, G., Jain, A., Asta, M., and Gamst, A.: A statistical learning framework for materials science: application to elastic moduli of k-nary inorganic polycrystalline compounds. Sci. Rep. 6, 34256 (2016).
70.Gómez-Bombarelli, R., Wei, J.N., Duvenaud, D., Hernández-Lobato, J.M., Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T.D., Adams, R.P., and Aspuru-Guzik, A.: Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268 (2018).
71.Olsthoorn, B., Geilhufe, R.M., Borysov, S.S., and Balatsky, A.V.: Band gap prediction for large organic crystal structures with machine learning. (2018) arXiv preprint arXiv:12814.
72.Mannodi-Kanakkithodi, A., Pilania, G., Huan, T.D., Lookman, T., and Ramprasad, R.: Machine learning strategy for accelerated design of polymer dielectrics. Sci. Rep. 6, 20952 (2016).
73.Collins, C.R., Gordon, G.J., von Lilienfeld, O.A., and Yaron, D.J.: Constant size descriptors for accurate machine learning models of molecular properties. J. Chem. Phys. 148, 241718 (2018).
74.Christensen, A., Faber, F., Huang, B., Bratholm, L., Tkatchenko, A., Müller, K., and von Lilienfeld, O.: QML: A Python Toolkit for Quantum Machine Learning, 2017. (accessed July 17, 2019).
75.Khorshidi, A. and Peterson, A.A.: Amp: a modular approach to machine learning in atomistic simulations. Comput. Phys. Commun. 207, 310 (2016).
76.Pun, G., Batra, R., Ramprasad, R., and Mishin, Y.: Physically-informed artificial neural networks for atomistic modeling of materials. (2018) arXiv preprint arXiv:01696.
77.Bartók, A.P. and Csányi, G.: Gaussian approximation potentials: a brief tutorial introduction. Int. J. Quantum Chem. 115, 1051 (2015).
78.Huan, T.D., Batra, R., Chapman, J., Krishnan, S., Chen, L., and Ramprasad, R.: A universal strategy for the creation of machine learning-based atomistic force fields. npj Comput. Mater. 3, 37 (2017).
79.Thompson, A.P., Swiler, L.P., Trott, C.R., Foiles, S.M., and Tucker, G.J.: Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials. J. Comput. Phys. 285, 316 (2015).
80.Kolb, B., Lentz, L.C., and Kolpak, A.M.: Discovering charge density functionals and structure-property relationships with PROPhet: a general framework for coupling machine learning and first-principles methods. Sci. Rep. 7, 1192 (2017).
81.Yao, K., Herr, J.E., Toth, D.W., Mckintyre, R., and Parkhill, J.: The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci. 9, 2261 (2018).
82.Smith, J.S., Isayev, O., and Roitberg, A.E.: ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192 (2017).
83.Artrith, N., Urban, A., and Ceder, G.: Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species. Phys. Rev. B 96, 014112 (2017).
84.Wang, H., Zhang, L., Han, J., and Weinan, E.: DeePMD-kit: a deep learning package for many-body potential energy representation and molecular dynamics. Comput. Phys. Commun. 228, 178 (2018).
85.Chmiela, S., Tkatchenko, A., Sauceda, H.E., Poltavsky, I., Schütt, K.T., and Müller, K.-R.: Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 3, e1603015 (2017).
86.Mardt, A., Pasquali, L., Wu, H., and Noé, F.: VAMPnets for deep learning of molecular kinetics. Nat. Commun. 9, 5 (2018).
87.Xue, D., Balachandran, P.V., Hogden, J., Theiler, J., Xue, D., and Lookman, T.: Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 7, 11241 (2016).
88.Gunning, David and Aha, David: DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Magazine 40, 44 (2019).
89.Mayr, A., Klambauer, G., Unterthiner, T., Steijaert, M., Wegner, J.K., Ceulemans, H., Clevert, D.-A., and Hochreiter, S.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9, 5541 (2018).
90.Curtarolo, S., Morgan, D., Persson, K., Rodgers, J., and Ceder, G.: Predicting crystal structures with data mining of quantum calculations. Phys. Rev. Lett. 91, 135503 (2003).
91.Pilania, G., Balachandran, P.V., Kim, C., and Lookman, T.: Finding new perovskite halides via machine learning. Front. Mater. 3, 19 (2016).
92.Oliynyk, A.O., Antono, E., Sparks, T.D., Ghadbeigi, L., Gaultois, M.W., Meredig, B., and Mar, A.: High-throughput machine-learning-driven synthesis of full-Heusler compounds. Chem. Mater. 28, 7324 (2016).
93.Hautier, G., Fischer, C.C., Jain, A., Mueller, T., and Ceder, G.: Finding nature's missing ternary oxide compounds using machine learning and density functional theory. Chem. Mater. 22, 3762 (2010).
94.Ahmad, Z., Xie, T., Maheshwari, C., Grossman, J.C., and Viswanathan, V.: Machine learning enabled computational screening of inorganic solid electrolytes for suppression of dendrite formation in lithium metal anodes. ACS Cent. Sci. 4, 996 (2018).
95.Pyzer-Knapp, E.O., Li, K., and Aspuru-Guzik, A.: Learning from the Harvard clean energy project: the use of neural networks to accelerate materials discovery. Adv. Funct. Mater. 25, 6495 (2015).
96.Stanev, V., Oses, C., Kusne, A.G., Rodriguez, E., Paglione, J., Curtarolo, S., and Takeuchi, I.: Machine learning modeling of superconducting critical temperature. npj Comput. Mater. 4, 29 (2018).
97.Botu, V., Batra, R., Chapman, J., and Ramprasad, R.: Machine learning force fields: construction, validation, and outlook. J. Phys. Chem. C 121, 511 (2016).
98.Kalinin, S.V., Rodriguez, B.J., Budai, J.D., Jesse, S., Morozovska, A., Bokov, A.A., and Ye, Z.-G.: Direct evidence of mesoscopic dynamic heterogeneities at the surfaces of ergodic ferroelectric relaxors. Phys. Rev. B 81, 064107 (2010).
99.Blaiszik, B., Chard, K., Pruyne, J., Ananthakrishnan, R., Tuecke, S., and Foster, I.: The materials data facility: data services to advance materials science research. JOM 68, 2045 (2016).
100.Sheppard, D.: Robert Le Rossignol, 1884–1976: engineer of the ‘Haber’ process. Notes Rec. R. Soc. 71, 263 (2017).
101.Hanak, J.J.: The ‘multiple-sample concept’ in materials research: synthesis, compositional analysis and testing of entire multicomponent systems. J. Mater. Sci. 5, 964 (1970).
102.Xiang, X.-D., Sun, X., Briceno, G., Lou, Y., Wang, K.-A., Chang, H., Wallace-Freedman, W.G., Chen, S.-W., and Schultz, P.G.: A combinatorial approach to materials discovery. Science 268, 1738 (1995).
103.Barber, Z. and Blamire, M.: High throughput thin film materials science. Mater. Sci. Technol. 24, 757 (2008).
104.Green, M.L., Choi, C., Hattrick-Simpers, J., Joshi, A., Takeuchi, I., Barron, S., Campo, E., Chiang, T., Empedocles, S., and Gregoire, J.: Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies. Appl. Phys. Rev. 4, 011105 (2017).
105.Maier, W.F., Stoewe, K., and Sieg, S.: Combinatorial and high-throughput materials science. Angew. Chem. 46, 6016 (2007).
106.Green, M.L., Takeuchi, I., and Hattrick-Simpers, J.R.: Applications of high throughput (combinatorial) methodologies to electronic, magnetic, optical, and energy-related materials. J. Appl. Phys. 113, 231101 (2013).
107.Dubois, J.-L., Duquenne, C., Holderich, W., and Kervennal, J.: Process for Dehydrating Glycerol to Acrolein (Google Patents, 2010).
108.Arriola, D.J., Carnahan, E.M., Hustad, P.D., Kuhlman, R.L., and Wenzel, T.T.: Catalytic production of olefin block copolymers via chain shuttling polymerization. Science 312, 714 (2006).
109.Meguro, S., Ohnishi, T., Lippmaa, M., and Koinuma, H.: Elements of informatics for combinatorial solid-state materials science. Meas. Sci. Technol. 16, 309 (2004).
110.Takeuchi, I., Lippmaa, M., and Matsumoto, Y.: Combinatorial experimentation and materials informatics. MRS Bull. 31, 999 (2006).
111.Koinuma, H.: Combinatorial materials research projects in Japan. Appl. Surf. Sci. 189, 179 (2002).
112.Smotkin, E.S. and Diaz-Morales, R.R.: New electrocatalysts by combinatorial methods. Ann. Rev. Mater. Res. 33, 557 (2003).
113.Watanabe, Y., Umegaki, T., Hashimoto, M., Omata, K., and Yamada, M.: Optimization of Cu oxide catalysts for methanol synthesis by combinatorial tools using 96 well microplates, artificial neural network and genetic algorithm. Catal. Today 89, 455 (2004).
114.Dell'Anna, R., Lazzeri, P., Canteri, R., Long, C.J., Hattrick-Simpers, J., Takeuchi, I., and Anderle, M.: Data analysis in combinatorial experiments: applying supervised principal component technique to investigate the relationship between ToF-SIMS Spectra and the composition distribution of ternary metallic alloy thin films. QSAR Comb. Sci. 27, 171 (2008).
115.Takeuchi, I., Long, C., Famodu, O., Murakami, M., Hattrick-Simpers, J., Rubloff, G., Stukowski, M., and Rajan, K.: Data management and visualization of x-ray diffraction spectra from thin film ternary composition spreads. Rev. Sci. Instrum. 76, 062223 (2005).
116.Yomada, Y. and Kobayashi, T.: Utilization of combinatorial method and high throughput experimentation for development of heterogeneous catalysts. J. Jpn. Petrol Inst. 49, 157 (2006).
117.Rodemerck, U., Baerns, M., Holena, M., and Wolf, D.: Application of a genetic algorithm and a neural network for the discovery and optimization of new solid catalytic materials. Appl. Surf. Sci. 223, 168 (2004).
118.Long, C., Hattrick-Simpers, J., Murakami, M., Srivastava, R., Takeuchi, I., Karen, V.L., and Li, X.: Rapid structural mapping of ternary metallic alloy systems using the combinatorial approach and cluster analysis. Rev. Sci. Instrum. 78, 072217 (2007).
119.Gregoire, J.M., Dale, D., and Van Dover, R.B.: A wavelet transform algorithm for peak detection and application to powder x-ray diffraction data. Rev. Sci. Instrum. 82, 015105 (2011).
120.Long, C., Bunker, D., Li, X., Karen, V., and Takeuchi, I.: Rapid identification of structural phases in combinatorial thin-film libraries using x-ray diffraction and non-negative matrix factorization. Rev. Sci. Instrum. 80, 103902 (2009).
121.LeBras, R., Damoulas, T., Gregoire, J.M., Sabharwal, A., Gomes, C.P., and Van Dover, R.B.: Constraint reasoning and kernel clustering for pattern decomposition with scaling. In International Conference on Principles and Practice of Constraint Programming, Perugia, Italy (Springer, 2011), pp. 508.
122.Bunn, J.K., Han, S., Zhang, Y., Tong, Y., Hu, J., and Hattrick-Simpers, J.R.: Generalized machine learning technique for automatic phase attribution in time variant high-throughput experimental studies. J. Mater. Res. 30, 879 (2015).
123.Bunn, J.K., Hu, J., and Hattrick-Simpers, J.R.: Semi-supervised approach to phase identification from combinatorial sample diffraction patterns. JOM 68, 2116 (2016).
124.Hattrick-Simpers, J.R., Gregoire, J.M., and Kusne, A.G.: Perspective: composition–structure–property mapping in high-throughput experiments: turning data into knowledge. APL Mater. 4, 053211 (2016).
125.Kusne, A.G., Keller, D., Anderson, A., Zaban, A., and Takeuchi, I.: High-throughput determination of structural phase diagram and constituent phases using GRENDEL. Nanotechnology 26, 444002 (2015).
126.Suram, S.K., Xue, Y., Bai, J., Le Bras, R., Rappazzo, B., Bernstein, R., Bjorck, J., Zhou, L., van Dover, R.B., and Gomes, C.P.: Automated phase mapping with AgileFD and its application to light absorber discovery in the V–Mn–Nb oxide system. ACS Comb. Sci. 19, 37 (2016).
127.Kusne, A.G., Gao, T., Mehta, A., Ke, L., Nguyen, M.C., Ho, K.-M., Antropov, V., Wang, C.-Z., Kramer, M.J., Long, C., and Takeuchi, I.: On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets. Sci. Rep. 4, 6367 (2014).
128.Cui, J., Chu, Y.S., Famodu, O.O., Furuya, Y., Hattrick-Simpers, J., James, R.D., Ludwig, A., Thienhaus, S., Wuttig, M., and Zhang, Z.: Combinatorial search of thermoelastic shape-memory alloys with extremely small hysteresis width. Nat. Mater. 5, 286 (2006).
129.Zakutayev, A., Stevanovic, V., and Lany, S.: Non-equilibrium alloying controls optoelectronic properties in Cu2O thin films for photovoltaic absorber applications. Appl. Phys. Lett. 106, 123903 (2015).
130.Yan, Q., Yu, J., Suram, S.K., Zhou, L., Shinde, A., Newhouse, P.F., Chen, W., Li, G., Persson, K.A., and Gregoire, J.M.: Solar fuels photoanode materials discovery by integrating high-throughput theory and experiment. Proc. Natl. Acad. Sci. USA 114, 3040 (2017).
131.Hattrick-Simpers, J.R., Choudhary, K., and Corgnale, C.: A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials. Mol. Sys. Des. Eng 3, 509 (2018).
132.Ren, F., Ward, L., Williams, T., Laws, K.J., Wolverton, C., Hattrick-Simpers, J., and Mehta, A.: Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments. Sci. Adv. 4, eaaq1566 (2018).
133.Yuan, R., Liu, Z., Balachandran, P.V., Xue, D., Zhou, Y., Ding, X., Sun, J., Xue, D., and Lookman, T.: Accelerated discovery of large electrostrains in BaTiO3-based piezoelectrics using active learning. Adv. Mater. 30, 1702884 (2018).
134.Bassman, L., Rajak, P., Kalia, R.K., Nakano, A., Sha, F., Sun, J., Singh, D.J., Aykol, M., Huck, P., and Persson, K.: Active learning for accelerated design of layered materials. npj Comput. Mater. 4, 74 (2018).
135.Podryabinkin, E.V., Tikhonov, E.V., Shapeev, A.V., and Oganov, A.R.: Accelerating crystal structure prediction by machine-learning interatomic potentials with active learning. Phys. Rev. B 99, 064114 (2019).
136.Talapatra, A., Boluki, S., Duong, T., Qian, X., Dougherty, E., and Arróyave, R.: Autonomous efficient experiment design for materials discovery with Bayesian model averaging. Phys. Rev. Mater. 2, 113803 (2018).
137.Lookman, T., Balachandran, P.V., Xue, D., and Yuan, R.: Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. npj Comput. Mater. 5, 21 (2019).
138.Meredig, B., Antono, E., Church, C., Hutchinson, M., Ling, J., Paradiso, S., Blaiszik, B., Foster, I., Gibbons, B., and Hattrick-Simpers, J.: Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Mol. Syst. Des. Eng. 3, 819 (2018).
139.King, R.D., Rowland, J., Aubrey, W., Liakata, M., Markham, M., Soldatova, L.N., Whelan, K.E., Clare, A., Young, M., and Sparkes, A.: The robot scientist Adam. Computer 42, 46 (2009).
140.Nikolaev, P., Hooper, D., Webber, F., Rao, R., Decker, K., Krein, M., Poleski, J., Barto, R., and Maruyama, B.: Autonomy in materials research: a case study in carbon nanotube growth. npj Comput. Mater. 2, 16031 (2016).
141.Roch, L.M., Häse, F., Kreisbeck, C., Tamayo-Mendoza, T., Yunker, L.P., Hein, J.E., and Aspuru-Guzik, A.: ChemOS: orchestrating autonomous experimentation. Sci. Robot. 3, eaat5559 (2018).
142.DeCost, B. and Kusne, G.: Deep Transfer Learning for Active Optimization of Functional Materials Properties in the Data-Limited Regime (MRS Fall, Boston, MA, 2018).
143.Kusne, G., DeCost, B., Hattrick-Simpers, J., and Takeuchi, I.: Autonomous Materials Research Systems—Phase Mapping (MRS Fall, Boston, MA, 2018).
144.Caramelli, D., Salley, D., Henson, A., Camarasa, G.A., Sharabi, S., Keenan, G., and Cronin, L.: Networking chemical robots for reaction multitasking. Nat. Commun 9, 3406 (2018).
145.Klucznik, T., Mikulak-Klucznik, B., McCormack, M.P., Lima, H., Szymkuć, S., Bhowmick, M., Molga, K., Zhou, Y., Rickershauser, L., and Gajewska, E.P.: Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522 (2018).
147.Ziatdinov, M., Dyck, O., Maksov, A., Li, X., Sang, X., Xiao, K., Unocic, R.R., Vasudevan, R., Jesse, S., and Kalinin, S.V.: Deep learning of atomically resolved scanning transmission electron microscopy images: chemical identification and tracking local transformations. ACS Nano 11, 12742 (2017).
148.Ziatdinov, M., Maksov, A., and Kalinin, S.V.: Learning surface molecular structures via machine vision. npj Comput. Mater. 3, 31 (2017).
149.Barthel, J.: Dr. Probe: a software for high-resolution STEM image simulation. Ultramicroscopy 193, 1 (2018).
150.Long, J., Shelhamer, E., and Darrell, T.: Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA (2015), pp. 3431.
151.Ziatdinov, M., Dyck, O., Sumpter, B.G., Jesse, S., Vasudevan, R.K., and Kalinin, S.V.: Building and exploring libraries of atomic defects in graphene: scanning transmission electron and scanning tunneling microscopy study. (2018) arXiv preprint arXiv:1809.04256.
152.Maksov, A., Dyck, O., Wang, K., Xiao, K., Geohegan, D.B., Sumpter, B.G., Vasudevan, R.K., Jesse, S., Kalinin, S.V., and Ziatdinov, M.: Deep learning analysis of defect and phase evolution during electron beam-induced transformations in WS2. npj Comput. Mater. 5, 12 (2019).
153.Ziatdinov, M., Dyck, O., Jesse, S., and Kalinin, S.V.. Atomic mechanisms for the Si atom dynamics in graphene: chemical transformations at the edge and in the bulk. (2019) arXiv preprint arXiv:1901.09322.
154.Yablon, D.G., Gannepalli, A., Proksch, R., Killgore, J., Hurley, D.C., Grabowski, J., and Tsou, A.H.: Quantitative viscoelastic mapping of polyolefin blends with contact resonance atomic force microscopy. Macromolecules 45, 4363 (2012).
155.Schlücker, S., Schaeberle, M.D., Huffman, S.W., and Levin, I.W.: Raman microspectroscopy: a comparison of point, line, and wide-field imaging methodologies. Anal. Chem. 75, 4312 (2003).
156.Ievlev, A.V., Maksymovych, P., Trassin, M., Seidel, J., Ramesh, R., Kalinin, S.V., and Ovchinnikova, O.S.: Chemical state evolution in ferroelectric films during tip-induced polarization and electroresistive switching. ACS Appl. Mater. Interfaces 8, 29588 (2016).
157.Hruszkewycz, S., Folkman, C., Highland, M., Holt, M., Baek, S., Streiffer, S., Baldo, P., Eom, C., and Fuoss, P.: X-ray nanodiffraction of tilted domains in a poled epitaxial BiFeO3 thin film. Appl. Phys. Lett. 99, 232903 (2011).
158.Cai, Z., Lai, B., Xiao, Y., and Xu, S.: An X-ray diffraction microscope at the Advanced Photon Source. In Journal de Physique IV (Proceedings); EDP Sciences, 2003; p. 17.
159.Kalinin, S.V., Karapetian, E., and Kachanov, M.: Nanoelectromechanics of piezoresponse force microscopy. Phys. Rev. B 70, 184101 (2004).
160.Eliseev, E.A., Kalinin, S.V., Jesse, S., Bravina, S.L., and Morozovska, A.N.: Electromechanical detection in scanning probe microscopy: tip models and materials contrast. J. Appl. Phys. 102, 014109 (2007).
161.Monig, H., Todorovic, M., Baykara, M.Z., Schwendemann, T.C., Rodrigo, L., Altman, E.I., Perez, R., and Schwarz, U.D.: Understanding scanning tunneling microscopy contrast mechanisms on metal oxides: a case study. ACS Nano 7, 10233 (2013).
162.Ievlev, A.V., Susner, M.A., McGuire, M.A., Maksymovych, P., and Kalinin, S.V.: Quantitative analysis of the local phase transitions induced by laser heating. ACS Nano 9, 12442 (2015).
163.Dönges, S.A., Khatib, O., O'Callahan, B.T., Atkin, J.M., Park, J.H., Cobden, D., and Raschke, M.B.: Ultrafast nanoimaging of the photoinduced phase transition dynamics in VO2. Nano Lett. 16, 3029 (2016).
164.Kim, Y., Strelcov, E., Hwang, I.R., Choi, T., Park, B.H., Jesse, S., and Kalinin, S.V.: Correlative multimodal probing of ionically-mediated electromechanical phenomena in simple oxides. Sci. Rep. 3, 2924 (2013).
165.Ovchinnikov, O., Jesse, S., Bintacchit, P., Trolier-McKinstry, S., and Kalinin, S.V.: Disorder identification in hysteresis data: recognition analysis of the random-bond–random-field ising model. Phys. Rev. Lett. 103, 157203 (2009).
166.Borodinov, N., Neumayer, S., Kalinin, S.V., Ovchinnikova, O.S., Vasudevan, R.K., and Jesse, S.: Deep neural networks for understanding noisy data applied to physical property extraction in scanning probe microscopy. npj Comput. Mater. 5, 25 (2019).
167.Pradhan, D.K., Kumari, S., Strelcov, E., Pradhan, D.K., Katiyar, R.S., Kalinin, S.V., Laanait, N., and Vasudevan, R.K.: Reconstructing phase diagrams from local measurements via Gaussian processes: mapping the temperature-composition space to confidence. npj Comput. Mater. 4, 1 (2018).
168.Li, L., Yang, Y., Zhang, D., Ye, Z.-G., Jesse, S., Kalinin, S.V., and Vasudevan, R.K.: Machine learning-enabled identification of material phase transitions based on experimental data: exploring collective dynamics in ferroelectric relaxors. Sci. Adv 4, 8672 (2018).
169.Shah, V.P., Younan, N.H., and King, R.L.: An efficient pan-sharpening method via a combined adaptive PCA approach and contourlets. IEEE Trans. Geosci. Remote Sens. 46, 1323 (2008).
170.Somnath, S., Belianinov, A., Kalinin, S.V., and Jesse, S.: Full information acquisition in piezoresponse force microscopy. Appl. Phys. Lett 107, 263102 (2015).
171.Somnath, S., Law, K.J., Morozovska, A., Maksymovych, P., Kim, Y., Lu, X., Alexe, M., Archibald, R., Kalinin, S.V., and Jesse, S.: Ultrafast current imaging by Bayesian inversion. Nat. Commun. 9, 513 (2018).
172.Somnath, S., Belianinov, A., Kalinin, S.V., and Jesse, S.: Rapid mapping of polarization switching through complete information acquisition. Nat. Commun. 7, 13290 (2016).
173.Collins, L., Belianinov, A., Somnath, S., Balke, N., Kalinin, S.V., and Jesse, S.: Full data acquisition in kelvin probe force microscopy: mapping dynamic electric phenomena in real space. Sci. Rep. 6, 30557 (2016).
174.Balke, N., Jesse, S., Yu, P., Carmichael, B., Kalinin, S.V., and Tselev, A.: Quantification of surface displacements and electromechanical phenomena via dynamic atomic force microscopy. Nanotechnology 27, 425707 (2016).
175.Labuda, A. and Proksch, R.: Quantitative measurements of electromechanical response with a combined optical beam and interferometric atomic force microscope. Appl. Phys. Lett. 106, 253103 (2015).
176.Kalidindi, S.R. and De Graef, M.: Materials data science: current status and future outlook. Ann. Rev. Mater. Res. 45, 171 (2015).
177.Fullwood, D.T., Niezgoda, S.R., and Kalidindi, S.R.: Microstructure reconstructions from 2-point statistics using phase-recovery algorithms. Acta Mater. 56, 942 (2008).
178.Kalidindi, S.R., Niezgoda, S.R., and Salem, A.A.: Microstructure informatics using higher-order statistics and efficient data-mining protocols. JOM 63, 34 (2011).
179.Sharma, V., Wang, C., Lorenzini, R.G., Ma, R., Zhu, Q., Sinkovits, D.W., Pilania, G., Oganov, A.R., Kumar, S., Sotzing, G.A., Boggs, S.A., and Ramprasad, R.: Rational design of all organic polymer dielectrics. Nat. Commun. 5, 4845 (2014).
180.Gopakumar, A.M., Balachandran, P.V., Xue, D., Gubernatis, J.E., and Lookman, T.: Multi-objective optimization for materials discovery via adaptive design. Sci. Rep. 8, 3738 (2018).
181.Hutchinson, M.L., Antono, E., Gibbons, B.M., Paradiso, S., Ling, J., and Meredig, B.: Overcoming data scarcity with transfer learning. (2017) arXiv preprint arXiv:1711.05099.
182.Oviedo, F., Ren, Z., Sun, S., Settens, C., Liu, Z., Hartono, N.T.P., Ramasamy, S., DeCost, B.L., Tian, S.I.P., Romano, G., Gilad Kusne, A., and Buonassisi, T.: Fast and interpretable classification of small x-ray diffraction datasets using data augmentation and deep neural networks. npj Comput. Mater. 5, 60 (2019).
183.Vlcek, L., Ziatdinov, M., Maksov, A., Tselev, A., Baddorf, A.P., Kalinin, S.V., and Vasudevan, R.K.: Learning from imperfections: predicting structure and thermodynamics from atomic imaging of fluctuations. ACS Nano 13, 718 (2019).
184.Vlcek, L., Vasudevan, R.K., Jesse, S., and Kalinin, S.V.: Consistent integration of experimental and ab initio data into effective physical models. J. Chem. Theory Comput. 13, 5179 (2017).
185.Vlcek, L., Maksov, A., Pan, M., Vasudevan, R.K., and Kalinin, S.V.: Knowledge extraction from atomically resolved images. ACS Nano 11, 10313 (2017).
186.Belianinov, A., He, Q., Kravchenko, M., Jesse, S., Borisevich, A., and Kalinin, S.V.: Identification of phases, symmetries and defects through local crystallography. Nat. Commun 6, 7801 (2015).
187.Ross, D., Strychalski, E.A., Jarzynski, C., and Stavis, S.M.: Equilibrium free energies from non-equilibrium trajectories with relaxation fluctuation spectroscopy. Nat. Phys 14, 842 (2018).
188.Kutnjak, Z., Petzelt, J., and Blinc, R.: The giant electromechanical response in ferroelectric relaxors as a critical phenomenon. Nature 441, 956 (2006).
189.Somnath, S., Smith, C.R., Laanait, N., Vasudevan, R.K., Ievlev, A., Belianinov, A., Lupini, A.R., Shankar, M., Kalinin, S.V., and Jesse, S.: USID and pycroscopy—open frameworks for storing and analyzing spectroscopic and imaging data. (2019) arXiv preprint arXiv:1903.09515.
190.Hall, S.R., Allen, F.H., and Brown, I.D.: The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr. A 47, 655 (1991).
191.Pearl, J.: Theoretical impediments to machine learning with seven sparks from the causal revolution. (2018) arXiv preprint arXiv:1801.04016.