Skip to main content Accessibility help

Embedding domain knowledge for machine learning of complex material systems

Published online by Cambridge University Press:  10 July 2019

Christopher M. Childs
Washburn Laboratory, Department of Chemistry, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
Newell R. Washburn
Department of Chemistry, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA Department of Biomedical Engineering, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
E-mail address:
Get access


Machine learning (ML) has revolutionized disciplines within materials science that have been able to generate sufficiently large datasets to utilize algorithms based on statistical inference, but for many important classes of materials the datasets remain small. However, a rapidly growing number of approaches to embedding domain knowledge of materials systems are reducing data requirements and allowing broader applications of ML. Furthermore, these hybrid approaches improve the interpretability of the predictions, allowing for greater physical insights into the factors that determine material properties. This review introduces a number of these strategies, providing examples of how they were implemented in ML algorithms and discussing the materials systems to which they were applied.

Artificial Intelligence Prospectives
Copyright © The Author(s) 2019 

Access options

Get access to the full version of this content by using one of the access options below.


1.Kittel, C.: Physical theory of ferromagnetic domains. Rev. Mod. Phys. 21, 541 (1949).CrossRefGoogle Scholar
2.Flory, P.J.: Molecular theory of rubber elasticity. Polym. J. 17, 1 (1985).CrossRefGoogle Scholar
3.Stickel, J.J. and Powell, R.L.: Fluid mechanics and rheology of dense suspensions. Annu. Rev. Fluid Mech. 37, 129 (2005).CrossRefGoogle Scholar
4.DeCost, B.L., Francis, T., and Holm, E.A.: Exploring the microstructure manifold: image texture representations applied to ultrahigh carbon steel microstructures. Acta Mater 133, 30 (2017).CrossRefGoogle Scholar
5.Saravanan, K., Kitchin, J.R., von Lilienfeld, O.A., and Keith, J.A.: Alchemical predictions for computational catalysis: potential and limitations. J. Phys. Chem. Lett. 8, 5002 (2017).CrossRefGoogle ScholarPubMed
6.Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A., and Kim, C.: Machine learning in materials informatics: recent applications and prospects. NPJ Comput. Mater. 3, 54 (2017).CrossRefGoogle Scholar
7.Jain, A., Ong, S.P., Hautier, G., Chen, W., Richards, W.D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G., and Persson, K.A.: Commentary: The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).CrossRefGoogle Scholar
8.McDowell, D.L. and Kalidindi, S.R.: The materials innovation ecosystem: a key enabler for the Materials Genome Initiative. MRS Bull. 41, 326 (2016).CrossRefGoogle Scholar
9.Qin, M., Lin, Z., Wei, Z., Zhu, B., Yuan, J., Takeuchi, I., and Jin, K.: High-throughput research on superconductivity. Chinese Phys. B 27, 127402 (2018).CrossRefGoogle Scholar
10.Gani, T.Z.H. and Kulik, H.J.: Understanding and breaking scaling relations in single-site catalysis: Methane to methanol conversion by Fe IV O. ACS Catal. 8, 975 (2018).CrossRefGoogle Scholar
11.Ramakrishna, S., Zhang, T.Y., Lu, W.-C., Qian, Q., Low, J.S.C., Yune, J.H.R., Tan, D.Z.L., Bressan, S., Sanvito, S., and Kalidindi, S.R.: Materials informatics. J. Intell. Manuf (2018). Scholar
12.McBride, M., Persson, N., Reichmanis, E., Grover, M., McBride, M., Persson, N., Reichmanis, E., and Grover, M.A.: Solving materials’ small data problem with dynamic experimental databases. Processes 6, 79 (2018).CrossRefGoogle Scholar
13.Kuhne, R., Ebert, R.-U., and Schuurmann, G.: Model selection based on structural similarity-method description and application to water solubility prediction. J. Chem. Inf. Model. 46, 636 (2006).CrossRefGoogle ScholarPubMed
14.Hughes, L.D., Palmer, D.S., Nigsch, F., and Mitchell, J.B.O.: Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and log P. J. Chem. Inf. Model. 48, 220 (2008).CrossRefGoogle ScholarPubMed
15.Sanchez-Lengeling, B., Roch, L.M., Perea, J.D., Langner, S., Brabec, C.J., and Aspuru-Guzik, A.: A Bayesian approach to predict solubility parameters. Adv. Theory Simul 2, 1 (2019).CrossRefGoogle Scholar
16.Meredig, B., Agrawal, A., Kirklin, S., Saal, J.E., Doak, J.W., Thompson, A., Zhang, K., Choudhary, A., and Wolverton, C.: Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys. Rev. B 89, 094104 (2014).CrossRefGoogle Scholar
17.Hansen, K., Biegler, F., Ramakrishnan, R., Pronobis, W., von Lilienfeld, O.A., Müller, K.-R., and Tkatchenko, A.: Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space. J. Phys. Chem. Lett. 6, 2326 (2015).CrossRefGoogle ScholarPubMed
18.Liu, Y., Zhao, T., Ju, W., and Shi, S.: Materials discovery and design using machine learning. J. Mater. 3, 159 (2017).Google Scholar
19.Rowe, R.C. and Colbourn, E.A.: Neural computing in product formulation. Chem. Educ. 8, 1 (2003).Google Scholar
20.Tanco, M., Viles, E., Ilzarbe, L., and Alvarez, M.J.: Implementation of design of experiments projects in industry. Appl. Stoch. Model. Bus. Ind. 25, 478 (2009).CrossRefGoogle Scholar
21.Montgomery, D.C.: Design and Analysis of Experiments. 8th ed. (Wiley, New York, 2012).Google Scholar
22.Jordan, M.I. and Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349, 255 (2015).CrossRefGoogle ScholarPubMed
23.Haenssle, H.A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T., Blum, A., Kalloo, A., Ben Hadj Hassen, A., Thomas, L., Enk, A., Uhlmann, L., and Holger Haenssle, m.A.: Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836 (2018).CrossRefGoogle ScholarPubMed
24.Griffiths, T.L., Baraff, E.R., and Tenenbaum, J.B.: Using physical theories to infer hidden causal structure. Proc. Annu. Meet. Cogn. Sci. Soc. 26, 500 (2004).Google Scholar
25.Michalski, R.S.: Toward a Unified Theory of Learning: An Outline of Basic Ideas. In First World Conference on the Fundamentals of Artificial Intelligence (Paris, 1991).Google Scholar
26.Carbonell, J.G., Michalski, R.S., and Mitchell, T.M.: An overview of machine learning. In Machine Learning: An Artificial Intelligence Approach, edited by Michalski, R.S., Carbonell, J.G. and Mitchell, T.M. (Springer-Verlag, Berlin, 1983).Google Scholar
27.Tenenbaum, J.B., Griffiths, T.L., and Kemp, C.: Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10, 309 (2006).CrossRefGoogle ScholarPubMed
28.Lake, B.M., Salakhutdinov, R., and Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350, 1332 (2015).CrossRefGoogle ScholarPubMed
29.Frawley, W.J. and Piatetsky-Shapior, G.: Knowedge Discovery in Databases. 1st ed. (The MIT Press, Cambridge, 1991).Google Scholar
30.Sacha, D., Sedlmair, M., Zhang, L., Lee, J.A., Peltonen, J., Weiskopf, D., North, S.C., and Keim, D.A.: What you see is what you can change: human-centered machine learning by interactive visualization. Neurocomputing 268, 164 (2017).CrossRefGoogle Scholar
31.Jain, A., Hautier, G., Ping Ong, S., and Persson, K.: New opportunities for materials informatics: resources and data mining techniques for uncovering hidden relationships. J. Mater. Res. 31, 977 (2016).CrossRefGoogle Scholar
32.Wu, Q., Suetens, P., and Oosterlinck, A.: Integration of heuristic and Bayesian approaches in a pattern-classification system. In Knowledge Discovery Databases, 1st ed, edited by Piatetsky-Shapiro, G., and Frawley, W.J. (The MIT Press, Cambridge, 1991), pp. 249260.Google Scholar
33.Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267 (1996).Google Scholar
34.Mitchell, J.B.O.: Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468 (2014).CrossRefGoogle ScholarPubMed
35.Mooney, C.Z. and Duval, R.D.: Bootstrapping A Nonparametric Approach to Statistical Inference (Sage Publications, Inc, Newbury Park, CA, 1993).Google Scholar
36.Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., and Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947 (2003).CrossRefGoogle ScholarPubMed
37.Xu, M., Watanachaturaporn, P., Varshney, P.K., and Arora, M.K.: Decision tree regression for soft classification of remote sensing data. Remote Sens. Environ. 97, 322 (2005).CrossRefGoogle Scholar
38.Liaw, A. and Wiener, M.: Classification and regression by RandomForest. R News 2/3, 18 (2002).Google Scholar
39.Rasmussen, C.E.: Gaussian processes in machine learning. In Adv. Lect. Mach. Learn. edited by Bousquet, O., von Luxburg, U. and Rätsch, G. (Springer-Verlag, Berlin, 2003), pp. 6371.Google Scholar
40.Rasmussen, C.E. and Williams, C.K.I.: Gaussian Processes for Machine Learning, 2nd ed. (MIT Press, Cambridge, 2006).Google Scholar
41.Li, H., Collins, C., Tanha, M., Gordon, G.J., and Yaron, D.J.: A density functional tight binding layer for deep learning of chemical hamiltonians. J. Chem. Theory Comput. 14, 5764 (2018).CrossRefGoogle ScholarPubMed
42.Li, Y., Li, H., Pickard, F.C., Narayanan, B., Sen, F.G., Chan, M.K.Y., Sankaranarayanan, S.K.R.S., Brooks, B.R., and Roux, B.: Machine learning force field parameters from ab initio data. J. Chem. Theory Comput 13, 4492 (2017).CrossRefGoogle ScholarPubMed
43.Schütt, K.T., Glawe, H., Brockherde, F., Sanna, A., Müller, K.R., and Gross, E.K.U.: How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B 89, 205118 (2014).CrossRefGoogle Scholar
44.Hu, L., Wang, X., Wong, L., and Chen, G.: Combined first-principles calculation and neural-network correction approach for heat of formation. J. Chem. Phys. 119, 11501 (2003).CrossRefGoogle Scholar
45.von Lilienfeld, O.A.: Quantum machine learning in chemical compound space. Angew. Chemie Int. Ed. 57, 4164 (2018).CrossRefGoogle ScholarPubMed
46.Gardas, R.L. and Coutinho, J.A.P.: A group contribution method for viscosity estimation of ionic liquids. Fluid Phase Equilib. 266, 195 (2008).CrossRefGoogle Scholar
47.Paduszyński, K. and Domańska, U.: Viscosity of ionic liquids: an extensive database and a new group contribution model based on a feed-forward artificial neural network. J. Chem. Inf. Model. 54, 1311 (2014).CrossRefGoogle Scholar
48.Mehrkesh, A. and Karunanithi, A.T.: New quantum chemistry-based descriptors for better prediction of melting point and viscosity of ionic liquids. Fluid Phase Equilib. 427, 498 (2016).CrossRefGoogle Scholar
49.Preiss, U., Bulut, S., and Krossing, I.: In silico prediction of the melting points of ionic liquids from thermodynamic considerations. A case study on 67 salts with a melting point range of 337 °C. J. Phys. Chem. B 114, 11133 (2010).CrossRefGoogle ScholarPubMed
50.Fatehi, M.-R., Raeissi, S., and Mowla, D.: Estimation of viscosities of pure ionic liquids using an artificial neural network based on only structural characteristics. J. Mol. Liq. 227, 309 (2017).CrossRefGoogle Scholar
51.Kalidindi, S.R. and De Graef, M.: Materials data science: current status and future outlook. Annu. Rev. Mater. Res. 45, 171 (2015).CrossRefGoogle Scholar
52.Magnan, C.N. and Baldi, P.: SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 30, 2592 (2014).CrossRefGoogle ScholarPubMed
53.Pilania, G., Wang, C., Jiang, X., Rajasekaran, S., and Ramprasad, R.: Accelerating materials property predictions using machine learning. Sci. Rep. 3, 2810 (2013).CrossRefGoogle ScholarPubMed
54.Vandenburg, H.J., Clifford, A.A., Bartle, K.D., Carlson, R.E., Carroll, J., and Newton, I.D.: A simple solvent selection method for accelerated solvent extraction of additives from polymers. Analyst 124, 1707 (1999).CrossRefGoogle Scholar
55.Hansen, C.: Hansen Solubility Parameters - A User's Handbook (CRC Press, Boca Raton, 1999).CrossRefGoogle Scholar
56.Lindvig, T., Michelsen, M.L., and Kontogeorgis, G.M.: A Flory – Huggins model based on the Hansen solubility parameters. Fluid Phase Equilib. 203, 247 (2002).CrossRefGoogle Scholar
57.Albahri, T.A.: Accurate prediction of the solubility parameter of pure compounds from their molecular structures. Fluid Phase Equilib. 379, 96 (2014).CrossRefGoogle Scholar
58.Stefanis, E. and Panayiotou, C.: Prediction of Hansen solubility parameters with a new group-contribution method. Int. J. Thermophys. 29, 568 (2008).CrossRefGoogle Scholar
59.Gal, Y. and Ghahramani, Z.: Proceeding of 33rd International Conference on Machine Learning (New York, 2016).Google Scholar
60.Cao, L., Li, C., and Mueller, T.: The use of cluster expansions to predict the structures and properties of surfaces and nanostructured materials. J. Chem. Inf. Model. 58, 2401 (2018).CrossRefGoogle ScholarPubMed
61.Mueller, T. and Ceder, G.: Bayesian approach to cluster expansions. Phys. Rev. B 80, 024103 (2009).CrossRefGoogle Scholar
62.Butler, K.T., Davies, D.W., Cartwright, H., Isayev, O., and Walsh, A.: Machine learning for molecular and materials science. Nature 559, 547 (2018).CrossRefGoogle ScholarPubMed
63.Ling, J., Jones, R., and Templeton, J.: Machine learning strategies for systems with invariance properties. J. Comput. Phys. 318, 22 (2016).CrossRefGoogle Scholar
64.E, W. and Ming, P.: Cauchy–Born rule and the stability of crystalline solids: static problems. Arch. Ration. Mech. Anal 183, 241 (2007).CrossRefGoogle Scholar
65.Cireşan, D.C., Meier, U., Gambardella, L.M., and Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22, 3207 (2010).CrossRefGoogle ScholarPubMed
66.Kambouchev, N., Fernandez, J., and Radovitzky, R.: A polyconvex model for materials with cubic symmetry. Model. Simul. Mater. Sci. Eng. 15, 451 (2007).CrossRefGoogle Scholar
67.Karpatne, A., Atluri, G., Faghmous, J.H., Steinbach, M., Banerjee, A., Ganguly, A., Shekhar, S., Samatova, N., and Kumar, V.: Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318 (2017).CrossRefGoogle Scholar
68.Xiao, H., Wu, J.-L., Wang, J.-X., Sun, R., and Roy, C.J.: Quantifying and reducing model-form uncertainties in Reynolds-averaged Navier–Stokes simulations: a data-driven, physics-informed Bayesian approach. J. Comput. Phys 324, 115 (2016).CrossRefGoogle Scholar
69.Wang, J.-X., Wu, J.-L., and Xiao, H.: Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data. Phys. Rev. Fluids 2, 34603 (2017).CrossRefGoogle Scholar
70.Ghiringhelli, L.M., Vybiral, J., Levchenko, S.V., Draxl, C., and Scheffler, M.: Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).CrossRefGoogle ScholarPubMed
71.Menon, A., Gupta, C., Perkins, K.M., DeCost, B.L., Budwal, N., Rios, R.T., Zhang, K., Póczos, B., and Washburn, N.R.: Elucidating multi-physics interactions in suspensions for the design of polymeric dispersants: a hierarchical machine learning approach. Mol. Syst. Des. Eng. 2, 263 (2017).CrossRefGoogle Scholar
72.Hirata, T., Ye, J., Branicio, P., Zheng, J., Lange, A., Plank, J., and Sullivan, M.: Adsorbed conformations of PCE superplasticizers in cement pore solution unraveled by molecular dynamics simulations. Sci. Rep. 7, 16599 (2017).CrossRefGoogle ScholarPubMed
73.Marchon, D., Juilland, P., Gallucci, E., Frunz, L., and Flatt, R.J.: Molecular and submolecular scale effects of comb-copolymers on tri-calcium silicate reactivity: toward molecular design. J. Am. Ceram. Soc. 100, 817 (2016).CrossRefGoogle Scholar
74.Ding, J.-T. and Li, Z.: Effects of Metakaolin and silica fume on properties of concrete. ACI Mater. J. 99, 393 (2002).Google Scholar
75.Washburn, N.R., Menon, A., Childs, C.M., Poczos, B., and Kurtis, K.E.: Machine learning approaches to admixture design for clay-based cements. In Calcined Clays for Sustainable Concrete, edited by Martirena, F., Favier, A. and Scrivener, K. (Springer, Dordrecht, 2017), pp. 488493.Google Scholar
76.Menon, A., Childs, C.M., Poczós, B., Washburn, N.R., and Kurtis, K.E.: Molecular engineering of superplasticizers for Metakaolin-Portland cement blends with hierarchical machine learning. Adv. Theory Simul 2, 1800164 (2018).CrossRefGoogle Scholar
77.Yoshioka, K., Sakai, E., Daimon, M., and Kitahara, A.: Role of steric hindrance in the performance of superplasticizers for concrete. J. Am. Ceram. Soc. 80, 2667 (1997).CrossRefGoogle Scholar
78.Hutchinson, M.L., Antono, E., Gibbons, B.M., Paradiso, S., Ling, J., and Meredig, B.: Overcoming data scarcity with transfer learning. In 31st Conference on Neural Information Processing Systems (NIPS 2017) (Long Beach, 2017), pp. 110.Google Scholar
79.Welborn, M., Cheng, L., and Miller, T.F.: Transferability in machine learning for electronic structure via the molecular orbital basis. J. Chem. Theory Comput. 14, 4772 (2018).CrossRefGoogle ScholarPubMed
80.Bartók, A.P., De, S., Poelking, C., Bernstein, N., Kermode, J.R., Csányi, G., and Ceriotti, M.: Machine learning unifies the modeling of materials and molecules. Sci. Adv. 3, e1701816 (2017).CrossRefGoogle ScholarPubMed
81.Parish, E.J. and Duraisamy, K.: A paradigm for data-driven predictive modeling using field inversion and machine learning. J. Comput. Phys. 305, 758 (2016).CrossRefGoogle Scholar

Altmetric attention score

Full text views

Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.

Total number of HTML views: 454
Total number of PDF views: 660 *
View data table for this chart

* Views captured on Cambridge Core between 10th July 2019 - 17th January 2021. This data will be updated every 24 hours.

Hostname: page-component-77fc7d77f9-w9qs9 Total loading time: 0.315 Render date: 2021-01-17T17:04:27.038Z Query parameters: { "hasAccess": "0", "openAccess": "0", "isLogged": "0", "lang": "en" } Feature Flags last update: Sun Jan 17 2021 17:02:12 GMT+0000 (Coordinated Universal Time) Feature Flags: { "metrics": true, "metricsAbstractViews": false, "peerReview": true, "crossMark": true, "comments": true, "relatedCommentaries": true, "subject": true, "clr": true, "languageSwitch": true, "figures": false, "newCiteModal": false, "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true }

Send article to Kindle

To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Embedding domain knowledge for machine learning of complex material systems
Available formats

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Embedding domain knowledge for machine learning of complex material systems
Available formats

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Embedding domain knowledge for machine learning of complex material systems
Available formats

Reply to: Submit a response

Your details

Conflicting interests

Do you have any conflicting interests? *