Bibliography

Pablo Duboue

doi:10.1017/9781108671682.014

[1] Martín, Abadi, Agarwal, Ashish, Barham, Paul, Brevdo, Eugene, Chen, Zhifeng, Citro, Craig, Corrado, Greg S., Davis, Andy, Dean, Jeffrey, Devin, Matthieu, Ghemawat, Sanjay, Goodfellow, Ian, Harp, Andrew, Irving, Geoffrey, Isard, Michael, Jia, Yangqing, Jozefowicz, Rafal, Kaiser, Lukasz, Kudlur, Manjunath, Levenberg, Josh, Mané, Dandelion, Monga, Rajat, Moore, Sherry, Murray, Derek, Olah, Chris, Schuster, Mike, Shlens, Jonathon, Steiner, Benoit, Sutskever, Ilya, Talwar, Kunal, Tucker, Paul, Vanhoucke, Vincent, Vasudevan, Vijay, Viégas, Fernanda, Vinyals, Oriol, Warden, Pete, Wattenberg, Martin, Wicke, Martin, Yu, Yuan and Zheng, Xiaoqiang. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. Accessed: 2018-12-13.Google Scholar

[2] Aggarwal, Charu C.. Outlier Analysis. Cambridge, MA: Springer, 2013.Google Scholar

[3] Aggarwal, Charu C. and Philip, S. Yu. A general survey of privacy-preserving data mining models and algorithms. In Privacy-Preserving Data Mining, pages 11–52. Cambridge, MA: Springer, 2008.Google Scholar

[4] Agrawal, Rakesh, Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, volume 1215, pages 487–499, 1994.Google Scholar

[5] Aho, Alfred V., Sethi, Ravi and Ullman, Jeffrey D.. Compilers: Principles, Techniques, and Tools. Boston, MA: Addison-Wesley, 1986.Google Scholar

[6] Akmajian, Adrian, Farmer, Ann K., Lee, Bickmore, Demers, Richard A. and Harnish, Robert M.. Linguistics: An Introduction to Language and Communication. Cambridge, MA: MIT Press, 2017.Google Scholar

[7] Alpaydin, E.. Introduction to Machine Learning. Cambridge, MA: MIT Press, 2010.Google Scholar

[8] Amaldi, Edoardo and Kann, Viggo. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209(1-2):237–260, 1998.Google Scholar

[9] Amdahl, Gene M.. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, Spring Joint Computer Conference, pages 483–485. New York, NY: ACM, 1967.Google Scholar

[10] Anderson, Michael R. and Cafarella, Michael J.. Input selection for fast feature engineering. In IEEE 32nd International Conference on Data Engineering (ICDE), pages 577–588, Helsinki, IEEE, 2016.Google Scholar

[11] Andreas, Jacob, Dragan, Anca D. and Klein, Dan. Translating neuralese. In Regina Barzilay and Min-Yen Kan, (eds.), Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers) pages 232–242. Association for Computational Linguistics, 2017.Google Scholar

[12] Austin Appleby. MurmurHash 3.0. https://github.com/aappleby/smhasher, 2010. Accessed: 2018-12-13.Google Scholar

[13] Arthur, David and Vassilvitskii, Sergei. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2007.Google Scholar

[14] Assogba, Yannick. Filmstrip, an opencv/python based set of scripts for extracting keyframes from video. https://github.com/tafsiri/filmstrip, 2014. Accessed: 2018-12-20.Google Scholar

[15] Attardi, Giuseppe. wikiextractor. https://github.com/attardi/wikiextractor, 2018. Accessed: 2018-12-12.Google Scholar

[16] Bagallo, Giulia and Haussler, David. Boolean feature discovery in empirical learning. Machine Learning, 5(1):71–99, 1990.CrossRef Google Scholar

[17] Baker, Ryan S.. Week 6: Behavior detection and model assessment. www.youtube.com/watch?v=5DWZoXI5z-E, 2014. Accessed: 2018-06-11.Google Scholar

[18] Baker, Ryan S.. Advanced Excel. www.columbia.edu/∼rsb2162/FES2015/FES-AdvancedExcel-v1.pptx, 2015. Accessed: 2018-06-11.Google Scholar

[19] Baker, Ryan S.. Data cleaning. www.columbia.edu/∼rsb2162/ FES2015/FES-DataCleaning-v1.pptx, 2015. Accessed: 2018-06-11.Google Scholar

[20] Baker, Ryan S.. Data sets. www.columbia.edu/∼rsb2162/FES2015/FES-SpecialSession1-DataSets-v2.pptx, 2015. Accessed: 2018-06-11.Google Scholar

[21] Baker, Ryan S.. Feature adaptation. www.columbia.edu/∼rsb2162/FES2015/FES-FeatureAdaptation-v2.pptx, 2015. Accessed: 2018-06-11.Google Scholar

[22] Baker, Ryan S.. Feature distillation. www.columbia.edu/∼rsb2162/FES2015/FES-FeatureDistillationpt2-v1.pptx, 2015. Accessed: 2018-06-10.Google Scholar

[23] Baker, Ryan S.. Feature distillation I. www.columbia.edu/∼rsb2162/FES2015/FES-FeatureDistillation-I-v1.pptx, 2015. Accessed: 2018-06-11.Google Scholar

[24] Baker, Ryan S.. Iterative feature refinement. www.columbia.edu/∼rsb2162/FES2015/FES-IterativeFeatureRefinement-v2.pptx, 2015. Accessed: 2018-06-11.Google Scholar

[25] Baker, Ryan S.. Prediction modeling. www.columbia.edu/∼rsb2162/FES2015/FES-SpecialSession-PredictionModeling-v1.pptx, 2015. Accessed: 2018-06-11.Google Scholar

[26] Barbu, Andrei, Mei, Tao, Narayanaswamy, Siddharth, Puneet Kumar Dokania, Quanshi Zhang, Nishant Shukla, Jiebo Luo and Rahul Sukthankar. Language and vision workshop at CVPR 2018. https://languageandvision.com/. Accessed: 2018-2-1.Google Scholar

[27] Barnett, V and Lewis, T. Outliers in Statistical Data. Hoboken, NJ: Wiley, 3rd edition, 1994.Google Scholar

[28] Barsotti, Damián, Domínguez, Martín A. and Duboue, Pablo A.. Predicting invariant nodes in large scale semantic knowledge graphs. In Information Management and Big Data – 4th Annual International Symposium, SIMBig 2017, Lima, Peru, September 4–6, 2017, Revised Selected Papers, pages 48–60, 2017.Google Scholar

[29] Estela Maria Bee de Dagum. Models for Time Series. Ottawa: Information Canada, 1974.Google Scholar

[30] Bell, Anthony J. and Sejnowski, Terrence J.. Edges are the ‘independent components’ of natural scenes. In Advances in Neural Information Processing Systems, pages 831–837, 1997.Google Scholar

[31] Bengio, Yoshua, Ducharme, Réjean, Vincent, Pascal and Jauvin, Christian. A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb):1137–1155, 2003.Google Scholar

[32] Berners-Lee, T., Fielding, R. and Masinter, L.. Uniform resource identifiers (URI): Generic syntax. RFC Editor, United States, 1998.Google Scholar

[33] Berners-Lee, Tim, Hendler, James and Lassila, Ora. The semantic web. Scientific American, 284(5):34–43, May 2001.Google Scholar

[34] Bietti, Alberto. What is feature discretization? From Quora. www.quora.com/What-is-feature-discretization, 2013. Accessed: 2019-01-27.Google Scholar

[35] Bikel, Daniel M. Intricacies of collins’ parsing model. Computational Linguistics, 30(4):479–511, 2004.Google Scholar

[36] Bilenko, Misha. Big learning made easy – with counts! Microsoft Machine Learning Blog. https://blogs.technet.microsoft.com/machinelearning/2015/02/17/big-learning-made-easy-with-counts/, 2015. Accessed: 2019-07-11.Google Scholar

[37] Bird, Steven, Klein, Ewan and Loper, Edward. Natural language processing with Python: Analyzing text with the natural language toolkit. New York: O’Reilly Media, Inc., 2009.Google Scholar

[38] Blei, David M., Ng, Andrew Y., and Jordan, Michael I.. Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993–1022, 2003.Google Scholar

[39] Bollacker, Kurt, Evans, Colin, Paritosh, Praveen, Sturge, Tim and Taylor, Jamie. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08, pages 1247–1250, New York: ACM, 2008.Google Scholar

[40] Bottou, Leon. COS 424: Interacting with Data. Princeton CS Class 18, Feature Engineering. www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf, 2010. Accessed: 2018-05-01.Google Scholar

[41] Bouchard-Côté, Alexandre. CS 294: Practical Machine Learning. Princeton EECS class Feature Engineering and Selection. https://people.eecs.berkeley.edu/∼jordan/courses/294-fall09/lectures/feature/slides.pdf, October 2016. Accessed: 2018-05-02.Google Scholar

[42] Bousquet, Olivier and Elisseeff, André. Stability and generalization. Journal of Machine Learning Research, 2(Mar):499–526, 2002.Google Scholar

[43] Boyd-Graber, Jordan. Digging into data - feature engineering (spoilers). www.youtube.com/watch?v=oYe03Y1WQaI, 2016. Accessed: 2018-06-11.Google Scholar

[44] Boyd-Graber, Jordan. Machine learning: feature engineering. www.youtube.com/watch?v=0BGAD23_mhE, 2016. Accessed: 2018-06-06.Google Scholar

[45] Bradski, Gary and Kaehler, Adrian. Learning OpenCV: Computer Vision with the OpenCV Library. New York: O’Reilly Media, Inc., 2008.Google Scholar

[46] Breiman, Leo. Random forests. Machine Learning, 45(1):5–32, 2001.Google Scholar

[47] Breiman, Leo. Statistical modeling: The two cultures. Statistical Science, 16(3):199–215, 2001.Google Scholar

[48] Breiman, Leo, Friedman, Jerome H., Olshen, Richard A. and Stone, Charles J.. Classification and Regression Trees. The Wadsworth Statistics/Probability series. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software, 1984.Google Scholar

[49] Brink, Henrik, Richards, Joseph W. and Fetherolf, Mark. Real-World Machine Learning. Shelter Island, NY: Manning, 2017.Google Scholar

[50] Brown, Gavin, Pocock, Adam, Zhao, Ming-Jie and Luján, Mikel. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 13(Jan):27–66, 2012.Google Scholar

[51] Brownlee, Jason. Discover feature engineering, how to engineer features and how to get good at it. https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/, oct 2014. Accessed: 2018-05-02.Google Scholar

[52] Buchanan, Bruce and Wilkins, David. Readings in Knowledge Acquisition and Learning. New York, NY: Morgan Kaufmann, 1993.Google Scholar

[53] Buckley, Chris. trec eval ir evaluation package. https://github.com/usnistgov/treceval, 2004. Accessed: 2019-11-11Google Scholar

[54] Buonomano, Dean. Your Brain Is a Time Machine: The Neuroscience and Physics of Time, chapter 1. New York, NY: WW Norton, paperback edition, April 2018.Google Scholar

[55] Candel, Arno. Anomaly detection and feature engineering. www.youtube.com/watch?v=fUSbljByXak, 2014. Accessed: 2018-06-10.Google Scholar

[56] Canny, J.. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986.Google Scholar

[57] Carletta, Jean. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249–254, 1996.Google Scholar

[58] Cavanaugh, Joseph E.. Unifying the derivations for the akaike and corrected akaike information criteria. Statistics & Probability Letters, 33(2):201–208, 1997.Google Scholar

[59] Cavanaugh, Joseph E. and Neath, Andrew A.. Generalizing the derivation of the schwarz information criterion. Communications in Statistics-Theory and Methods, 28(1):49–66, 1999.Google Scholar

[60] Chan, C.-C., Batur, Celai and Srinivasan, Arvind. Determination of quantization intervals in rule based model for dynamic systems. In Decision Aiding for Complex Systems, Conference Proceedings. 1991 IEEE International Conference on Systems, Man, and Cybernetics, volume 3, pages 1719–1723. Charlottesville, VA: IEEE, 1991.Google Scholar

[61] Chatfield, Chris. The Analysis of Time Series: An Introduction. Boca Raton, FL: CRC Press, 2016.Google Scholar

[62] Chen, Jim X.. The evolution of computing: Alphago. Computing in Science and Engineering, 18(4):4–7, 2016.CrossRef Google Scholar

[63] Chen, Jingnian, Huang, Houkuan, Tian, Shengfeng and Youli, Qu. Feature selection for text classification with naïve bayes. Expert Systems with Applications, 36(3):5432–5435, 2009.CrossRef Google Scholar

[64] Chiang, David, Joshi, Aravind K. and Searls, David B.. Grammatical representations of macromolecular structure. Journal of Computational Biology, 13(5):1077–1100, 2006.Google Scholar

[65] Christopher, Brian. Time Series Analysis (TSA) in Python: Linear models to GARCH. www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016, 2016. Accessed: 2019-01-15.Google Scholar

[66] Chuklin, Aleksandr, Markov, Ilya and de Rijke, Maarten. An introduction to click models for web search: Sigir 2015 tutorial. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1113–1115. ACM, 2015.Google Scholar

[67] Codd, Edgar F.. A relational model of data for large shared data banks. Communications of the ACM, 13(6):377–387, 1970.Google Scholar

[68] Cohen, William. Learning trees and rules with set-valued features. In Proceedings of the 14th Joint American Association for Artificial Intelligence and IAAI Conference (AAAI/IAAI-96), pages 709–716. American Association for Artificial Intelligence, 1996.Google Scholar

[69] Collobert, Ronan, Weston, Jason, Bottou, Léon, Karlen, Michael, Kavukcuoglu, Koray and Kuksa, Pavel. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537, 2011.Google Scholar

[70] Unicode Consortium, et al. The Unicode Standard, Version 2.0. Boston, MA: Addison-Wesley Longman Publishing Co., Inc., 1997.Google Scholar

[71] DBpedia Contributors. DBpedia. http://dbpedia.org, 2018. Accessed: 2018-11-05.Google Scholar

[72] Quora Contributors. What are some best practices in feature engineering? www.quora.com/What-are-some-best-practices-in-Feature-Engineering, 2016. Accessed: 2018-05-02.Google Scholar

[73] Weka Mailing List Contributors. Mutual information feature selection. Retrieved from http://weka.8497.n7.nabble.com/Mutual-Information-Feature-Selection-tp8975.html, 2007. Accessed: 2019-01-10.Google Scholar

[74] Conway, Drew and White, John Myles. Machine Learning for Hackers. Sebastopol, CA: O’Reilly Media, 2012.Google Scholar

[75] Cormen, Thomas H, Leiserson, Charles E, Rivest, Ronald L and Stein, Clifford. Introduction to Algorithms. Cambridge, MA: MIT Press, 2009.Google Scholar

[76] Bahnsen, Alejandro Correa, Aouada, Djamila, Stojanovic, Aleksandar and Ottersten, Björn. Feature engineering strategies for credit card fraud detection. 51, 01 2016. Expert Systems with Applications, 51(01), 2016.Google Scholar

[77] Cotter, Andrew, Keshet, Joseph and Srebro, Nathan. Explicit approximations of the gaussian kernel. Technical report, arXiv:1109.4603, 2011.Google Scholar

[78] Cowie, Jim and Lehnert, Wendy. Information extraction. Commun. ACM, 39(1):80–91, January 1996.Google Scholar

[79] Dalal, Navneet and Triggs, Bill. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. San Diego, CA: IEEE, 2005.Google Scholar

[80] Dempster, A. P., Laird, N. M. and Rubin, D. B.. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, Series B (Methodological), 39(1), 1977.Google Scholar

[81] Desmond, Joanne and Copeland, Lanny R.. Communicating with Today’s Patient: Essentials to Save Time, Decrease Risk, and Increase Patient Compliance. San Francisco, CA: Jossey-Bass, September 2000.Google Scholar

[82] Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton and Toutanova, Kristina. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.Google Scholar

[83] Dimitrova, Elena S., Licona, M. Paola Vera, McGee, John and Lauben-bacher, Reinhard. Discretization of time series data. Journal of Computational Biology, 17(6):853–868, 2010.Google Scholar

[84] Domingos, Pedro. A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87, 2012.Google Scholar

[85] Dong, Jingming, Karianakis, Nikolaos, Davis, Damek, Hernandez, Joshua, Balzer, Jonathan and Soatto, Stefano. Multi-view feature engineering and learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3251–3260, Burlington, MA: Morgan Kaufmann, 2015.Google Scholar

[86] Dougherty, James, Kohavi, Ron and Sahami, Mehran. Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning, pages 194–202, Tahoe City, California, Burlington, MA: Morgan Kaufmann, 1995.Google Scholar

[87] Douillard, Arthur. Object detection with deep learning on aerial imagery. https://medium.com/data-from-the-trenches/object-detection-with-deep-learning-on-aerial-imagery-2465078db8a9. Retrieved Jan 20, 2019, 2018.Google Scholar

[88] Drineas, Petros and Mahoney, Michael W.. On the Nyström method for approximating a gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6(Dec):2153–2175, 2005.Google Scholar

[89] Duboue, Pablo A.. Indirect Supervised Learning of Strategic Generation Logic. PhD thesis, Computer Science Department, New York, NY: Columbia University Press, June 2005.Google Scholar

[90] Duboue, Pablo A.. gitrecommender. https://github.com/DrDub/gitrecommender, 2014. Accessed: 2018-12-12.Google Scholar

[91] Duboue, Pablo A.. Automatic reports from spreadsheets: Data analysis for the rest of us. In Proceedings of the 9th International Natural Language Generation conference, pages 244–245. Association for Computational Linguistics, 2016.Google Scholar

[92] Duboue, Pablo A.. Deobfuscating name scrambling as a natural language generation task. In Argentinian Symposium on Artificial Intelligence (ASAI), Buenos Aires, Argentina, 2018.Google Scholar

[93] Duboue, Pablo A. and Domínguez, Martin A.. Using Robustness to Learn to Order Semantic Properties in Referring Expression Generation, pages 163–174. Cham: Springer, 2016.Google Scholar

[94] Duboue, Pablo A., Domınguez, Martin A. and Estrella, Paula. On the robustness of standalone referring expression generation algorithms using rdf data. In WebNLG 2016, page 17, Edinburgh, UK, 2016.Google Scholar

[95] Dunn, Olive Jean. Multiple comparisons among means. Journal of the American Statistical Association, 56(293):52–64, 1961.Google Scholar

[96] Eastlake, D. and Jones, P.. US Secure Hash Algorithm 1 (SHA1). RFC 3174, IETF, 9 2001.Google Scholar

[97] Ellson, John, Gansner, Emden, Koutsofios, Lefteris, North, Stephen and Woodhull, Gordon. Graphviz – Open source graph drawing tools. In Lecture Notes in Computer Science, pages 483–484. New York, NY: Springer-Verlag, 2001.Google Scholar

[98] Elsken, Thomas, Metzen, Jan Hendrik and Hutter, Frank. Neural Architecture Search, In Hutter, Frank, Kotthoff, Lars, and Vanschoren, Joaquin (eds.), Automatic Machine Learning: Methods, Systems, Challenges, pages 63– 77. Cambridge, MA: Springer International Publishing, 2019. Available at http://automl.org/book.Google Scholar

[99] Epstein, David. Feature engineering. www.slideshare.net/odsc/feature-engineering, 2015. Accessed: 2018-06-10.Google Scholar

[100] Esteban, C., Tresp, V., Yang, Y., Baier, S. and Krompass, D.. Predicting the co-evolution of event and knowledge graphs. In 2016 19th International Conference on Information Fusion (FUSION), pages 98–105, July 2016.Google Scholar

[101] Ester, Martin, Kriegel, Hans-Peter, Sander, Jörg, Xu, Xiaowei, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.Google Scholar

[102] Eubanks, Virginia. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York, NY: St. Martin’s Press, 2018.Google Scholar

[103] Fayyad, Usama and Irani, Keki. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1022–1029, Tahoe City, CA: Morgan Kaufmann, 1993.Google Scholar

[104] Fehlhaber, Kate. Hubel and wiesel and the neural basis of visual perception. https://knowingneurons.com/2014/10/29/hubel-and-wiesel-the-neural-basis-of-visual-perception/, 2014.Google Scholar

[105] Fellbaum, C.. WordNet – An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.Google Scholar

[106] Ferrucci, David and Lally, Adam. Uima: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4):327–348, 2004.Google Scholar

[107] Rupert Firth, John. A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis, pages 1–32. Oxford: Blackwell, 1957.Google Scholar

[108] Fleiss, J. L.. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971.Google Scholar

[109] Fowler, Glenn, Curt Noll, Landon, Vo, Kiem-Phong, Donald Eastlake and Tony Hansen. The fnv non-cryptographic hash algorithm. Ietf-draft, 2011.Google Scholar

[110] Frank, Eibe. Mutual information. Retrieved from http://weka.8497.n7.nabble.com/Mutual-information-tt41569.html#a41580, 2017. Accessed: 2019-01-10.Google Scholar

[111] Friedl, Jeffrey E. F.. Mastering Regular Expressions. Sebastopol, CA: O’Reilly & Associates, Inc. 2nd edition, 2002.Google Scholar

[112] Friedman, Jerome H.. Multivariate adaptive regression splines. The Annals of Statistics, pages 1–67, 1991.Google Scholar

[113] Fukunaga, Keinosuke. Introduction to Statistical Pattern Recognition. Computer Science and Scientific Computing. Cambridge, MA: Academic Press, 2nd edition, 1990.Google Scholar

[114] Fulcher, Ben D., Little, Max A., and Jones, Nick S.. Highly comparative time-series analysis: The empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83):20130048, 2013.Google Scholar

[115] Gabor, Dennis. Theory of communication. part 1: The analysis of information. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 93(26):429–441, 1946.Google Scholar

[116] Gabrys, Piotr. Non-negative matrix factorization for recommendation systems. https://medium.com/logicai/non-negative-matrix-factorization-for-recommendation-systems-985ca8d5c16c, 2018. Accessed 2019-22-1.Google Scholar

[117] Gale, William A. and Sampson, Geoffrey. Good-turing frequency estimation without tears. Journal of Quantitative Linguistics, 2(3):217–237, 1995.Google Scholar

[118] Garcia, Salvador, Luengo, Julian, Sáez, José Antonio, Lopez, Victoria and Herrera, Francisco. A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4):734–750, 2013.Google Scholar

[119] Gelman, Andrew. Analysis of variance–Why it is more important than ever. Ann. Statist., 33(1):1–53, 2005.Google Scholar

[120] Géron, Aurélien. Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems. Sebastopol, CA: O’Reilly Media, 2017.Google Scholar

[121] Ghosh, Swarnendu, Das, Nibaran, Gonçalves, Teresa, Quaresma, Paulo and Kundu, Mahantapas. The journey of graph kernels through two decades. Computer Science Review, 27:88–111, 2018.Google Scholar

[122] Glorot, Xavier and Bengio, Yoshua. Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and D. Mike Titterington, (eds.), AISTATS, volume 9 of JMLR Proceedings, pages 249–256, 2010.Google Scholar

[123] Goldberg, David. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys (CSUR), 23(1):5–48, 1991.Google Scholar

[124] Gondek, D. C., Lally, A., Kalyanpur, A., Murdock, J. W., Duboue, P. A., Zhang, L., Pan, Y., Qiu, Z. M. and Welty, C.. A framework for merging and ranking of answers in DeepQA. IBM Journal of Research and Development, 56(3.4):14:1 – 14:12, 2012. Digital Object Identifier: 10.1147/JRD.2012.2188760.Google Scholar

[125] Goodfellow, Ian, Bengio, Yoshua and Courville, Aaron. Deep Learning. Cambridge, MA: MIT Press, 2016. www.deeplearningbook.org.Google Scholar

[126] Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron and Bengio, Yoshua. Generative adversarial nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., Weinberger, K. Q. (eds.), Advances in neural Information Processing Systems, pages 2672–2680, Cambridge, MA: MIT Press, 2014.Google Scholar

[127] Gordon, Josh. What makes a good feature. www.youtube.com/watch?v=N9fDIAflCMY, 2016. Accessed: 2018-06-06.Google Scholar

[128] Granitto, Pablo M., Furlanello, Cesare, Biasioli, Franco and Gasperi, Flavia. Recursive feature elimination with random forest for ptr-ms analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2):83–90, 2006.Google Scholar

[129] Gribov, Alexander and Krivoruchko, Konstantin. New Flexible Non-parametric Data Transformation for Trans-Gaussian Kriging, pages 51–65. Dordrecht, The Netherlands: Springer Netherlands, 2012.Google Scholar

[130] Grover, Aditya and Leskovec, Jure. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 855–864. San Francisco, CA: ACM, 2016.Google Scholar

[131] Quanquan, Gu, Zhenhui Li and Jiawei Han. Generalized fisher score for feature selection. In Cozman, Fábio Gagliardi and Pfeffer, Avi, (eds.), UAI, pages 266–273. Barcelona, Spain: AUAI Press, 2011.Google Scholar

[132] Liu, Huan Dong, Guozhu, (ed.). Feature Engineering for Machine Learning and Data Analytics. Series: Chapman/Hall/CRC Data Mining and Knowledge Discovery Series. Boca Raton, FL: CRC Press, 1st edition, April 2018.Google Scholar

[133] Gupta, Manish, Gao, Jing, Aggarwal, Charu C. and Han, Jiawei. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9):2250–2267, 2014.Google Scholar

[134] Guyon, Isabelle and Elisseeff, André. An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar):1157–1182, 2003.Google Scholar

[135] Guyon, Isabelle, Gunn, Steve, Nikravesh, Masoud and Zadeh, Lofti, (eds.). Feature Extraction, Foundations and Applications. Series Studies in Fuzziness and Soft Computing, Heidelberg, Germany: Physica-Verlag, Springer, 2006.Google Scholar

[136] Guyon, Isabelle, Weston, Jason, Barnhill, Stephen and Vapnik, Vladimir. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389–422, 2002.Google Scholar

[137] Halkidi, Maria, Batistakis, Yannis and Vazirgiannis, Michalis. On clustering validation techniques. Journal of Intelligent Information Systems, 17:107–145, 2001.Google Scholar

[138] Halko, Nathan, Martinsson, Per-Gunnar and Tropp, Joel A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217–288, 2011.Google Scholar

[139] Haralick, Robert M. et al. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786–804, 1979.Google Scholar

[140] Harper, F. Maxwell and Konstan, Joseph A.. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (tiis), 5(4):19, 2016.Google Scholar

[141] Harris, C. and Stephens, M.. A combined corner and edge detection. In Proceedings of The Fourth Alvey Vision Conf., pages 147–151, 1988.Google Scholar

[142] Hartigan, John A. and Wong, Manchek A.. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108, 1979.Google Scholar

[143] Hatzivassiloglou, Vasileios, Duboue, Pablo A. and Rzhetsky, Andrey. Disambiguating proteins, genes and RNA in text: A machine learning approach. Bioinformatics, 17(Suppl 1):97–106, 2001. PubMedID: 11472998.Google Scholar

[144] Hearst, Marti A.. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33–64, 1997.Google Scholar

[145] Heaton, Jeff. Encog: Library of interchangeable machine learning models for Java and C sharp. Journal of Machine Learning Research, 16:1243–1247, 2015.Google Scholar

[146] Heaton, Jeff. An empirical analysis of feature engineering for predictive modeling. In SoutheastCon, 2016, pages 1–6. Norfolk, VA: IEEE, 2016.Google Scholar

[147] Heaton, Jeff. Automated Feature Engineering for Deep Neural Networks with Genetic Programming. PhD thesis, Nova Southeastern University, 2017.Google Scholar

[148] Hinton, G. E. and Salakhutdinov, R. R.. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.Google Scholar

[149] Hinton, Geoffrey E. and Zemel, Richard S.. Autoencoders, minimum description length and helmholtz free energy. In Advances in Neural Information Processing Systems, pages 3–10, 1994.Google Scholar

[150] Hirt, C., Filmer, M. S. and Featherstone, W. E.. Comparison and validation of the recent freely available aster-gdem ver1, srtm ver4.1 and geodata dem-9s ver3 digital elevation models over australia. Australian Journal of Earth Sciences, 57(3):337–347, 2010.Google Scholar

[151] Hodge, Victoria and Austin, Jim. A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85–126, 2004.Google Scholar

[152] Hoerl, A. E. and Kennard, R. W.. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12:55–67, 1970.Google Scholar

[153] Hoffman, Martin. Kernels and the kernel trick. www.cogsys.wiai.unibamberg.de/teaching/ss06/hs_svm/slides/SVM_and_Kernels.pdf, 2006. Accessed: 2018-08-19.Google Scholar

[154] Hofmann, Thomas. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1-2):177–196, 2001.Google Scholar

[155] Holden, Daniel. My neural network isn’t working! what should i do? http://theorangeduck.com/page/neural-network-not-working, August 2017. Accessed: 2019-01-10.Google Scholar

[156] Honnibal, Matthew and Montani, Ines. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear, 2017.Google Scholar

[157] Howard, Jeremy and Ruder, Sebastian. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 328– 339, 2018.Google Scholar

[158] Hu, Yingjie, Gao, Song, Newsam, Shawn and Lunga, Dalton, (eds.). GeoAI’18: Proceedings of the 2Nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, New York: ACM, 2018.Google Scholar

[159] Huang, Chao. Meaning of the spectral norm of a matrix. From Mathematics Stack Exchange. https://math.stackexchange.com/questions/188202/meaning-of-the-spectral-norm-of-a-matrix, 2012. Accessed: 2018-12-17.Google Scholar

[160] Huang, Zhiheng, Xu, Wei and Yu, Kai. Bidirectional LSTM-CRF models for sequence tagging. CoRR, abs/1508.01991, 2015.Google Scholar

[161] Hubel, DH and Wiesel, TN. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3):574–591, 1959.CrossRef Google Scholar PubMed

[162] Huff, Darrell. How to Lie with Statistics. New York: Penguin Books, 1954.Google Scholar

[163] Hyvärinen, Aapo, Karhunen, Juha and Oja, Erkki. Independent Component Analysis. Hoboken: NJ: Wiley, 2001.Google Scholar

[164] Ingersoll, Grant S., Morton, Thomas S. and Farris, Andrew L.. Taming text: How to find, organise, and manipulate it. Shelter Island, NY: Pearson Education, 2013.Google Scholar

[165] Iwajomo, Soladoye B., Willemoes, Mikkel, Ottosson, Ulf, Strandberg, Roine and Thorup, Kasper. Data from: Intra-african movements of the african cuckoo cuculus gularis as revealed by satellite telemetry. www.datarepository.movebank.org/handle/10255/move.714, 2017. Accessed: 2018-1-20.Google Scholar

[166] Iwajomo, Soladoye B., Willemoes, Mikkel, Ottosson, Ulf, Strandberg, Roine and Thorup, Kasper. Intra-african movements of the african cuckoo cuculus gularis as revealed by satellite telemetry. Journal of Avian Biology, 49(1), 2018.Google Scholar

[167] Iyengar, Giri. CS 5304: Data science in the wild. Cornell EECS, class Feature Engineering. https://courses.cit.cornell.edu/cs5304/Lectures/lec5_FeatureEngineering.pdf, March 2016. Accessed: 2016-10-01.Google Scholar

[168] Japkowicz, Nathalie, Myers, Catherine, Gluck, Mark, et al. A novelty detection approach to classification. In IJCAI, volume 1, pages 518–523, 1995.Google Scholar

[169] Japkowicz, Nathalie and Shah, Mohak. Evaluating Learning Algorithms: A Classification Perspective. New York, NY: Cambridge University Press, 2011.Google Scholar

[170] John, George H.. Robust decision trees: Removing outliers from databases. In KDD, pages 174–179, 1995.Google Scholar

[171] Jacob, Joseph. How to improve machine learning: Tricks and tips for feature engineering. http://data-informed.com/how-to-improve-machine-learning-tricks-and-tips-for-feature-engineering/, 2016. Accessed: 2016-30-01.Google Scholar

[172] Jurafsky, Dan and Martin, James H.. Speech and Language Processing. to appear, 3rd edition, 2020.Google Scholar

[173] Kaggle, Inc. Titanic: Machine learning from disaster. www.kaggle.com/c/titanic, 2012. Accessed: 2018-08-28.Google Scholar

[174] Kanter, James Max. As someone who works with a lot of people new to machine learning, i appreciate… Hacker News: https://news.ycombinator.com/item?id=15919806, 2017. Accessed: 2019-01-12.Google Scholar

[175] Kanter, James Max and Veeramachaneni, Kalyan. Deep Feature Synthesis: Towards automating data science endeavors. In 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19-21, 2015, pages 1–10. New York, NY: IEEE, 2015.Google Scholar

[176] Kaushik, Saurav. Introduction to feature selection methods with an example (or how to select the right variables?). www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-methods-with-an-example-or-how-to-select-the-right-variables/, 2016. Accessed: 2018-10-15.Google Scholar

[177] Kelleher, John D., Namee, Brian Mac and D’arcy, Aoife. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. Cambridge, MA: MIT Press, 2015.Google Scholar

[178] Kelley, Tom. The Art of Innovation: Lessons in Creativity from IDEO, America’s Leading Design Firm, volume 10. New York, NY: Broadway Business, 2001.Google Scholar

[179] Kerber, Randy. Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial intelligence, pages 123–128. AAAI Press, 1992.Google Scholar

[180] Kern, Roman. Knowlede discovery and datamining i. kti graz university of technology. class feature engineering. http://kti.tugraz.at/staff/denis/courses/kddm1/featureengineering.pdf, November 2015. Accessed: 2018-05-03.Google Scholar

[181] Khurana, Udayan, Samulowitz, Horst and Turaga, Deepak. Feature engineering for predictive modeling using reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google Scholar

[182] Kim, Sangkyum, Kim, Hyungsul, Weninger, Tim, Han, Jiawei and Kim, Hyun Duk. Authorship classification: A discriminative syntactic tree mining approach. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 455–464. New York, NY: ACM, 2011.Google Scholar

[183] King, Stephen. On Writing: A Memoir of the Craft. New York: Scribner, 2010.Google Scholar

[184] Kingma, Diederik P. and Welling, Max. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2013.Google Scholar

[185] Kira, Kenji and Rendell, Larry A.. The feature selection problem: Traditional methods and a new algorithm. In AAAI, volume 2, pages 129–134, 1992.Google Scholar

[186] Ravi Kiran, B.. How can deep learning be used with video data? From Quora. www.quora.com/How-can-deep-learning-be-used-with-video-data, 2017. Accessed: 2019-01-11.Google Scholar

[187] Kluegl, Peter, Toepfer, Martin, Beck, Philip-Daniel, Fette, Georg and Puppe, Frank. Uima ruta: Rapid development of rule-based information extraction applications. Natural Language Engineering, 22(1):1–40, 2016.Google Scholar

[188] Kluyver, Thomas, Ragan-Kelley, Benjamin, Pérez, Fernando, Granger, Brian E, Frederic, Matthias Bussonnier Jonathan, Kelley, Kyle, Hamrick, Jessica B, Grout, Jason, Corlay, Sylvain, Ivanov, Paul, Avila, Damian, Abdalla, Safia, Willing, Carol, and Jupyter development team. Jupyter notebooks: A publishing format for reproducible computational workflows. In Fernando, Loizides and Birgit, Schmidt (eds.), Proceedings of the 20th International Conference on Electronic Publishing, pp. 87–90, Amsterdam, The Netherlands: IOS Press, 2016.Google Scholar

[189] Kohavi, Ron and John, George H.. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273–324, 1997.Google Scholar

[190] Kondor, Risi Imre and Lafferty, John. Diffusion kernels on graphs and other discrete structures. In Proceedings of the 19th international conference on machine learning, volume 2002, pages 315–322, 2002.Google Scholar

[191] Kotsiantis, Sotiris and Kanellopoulos, Dimitris. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32(1):47–58, 2006.Google Scholar

[192] Koza, J.. Genetic Programming II. Cambridge, MA: MIT Press, 1994.Google Scholar

[193] Kridler, Nicholas. Data agnosticism - Feature Engineering without domain expertise. www.youtube.com/watch?v=bL4b1sGnILU, June 2013. Accessed: 2018-06-04.Google Scholar

[194] Krizhevsky, Alex, Sutskever, Ilya and Hinton, Geoffrey E.. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, May 2017.Google Scholar

[195] Kuhn, Max. Feature engineering versus feature extraction: Game on! www.r-bloggers.com/feature-engineering-versus-feature-extraction-game-on/, 2015. Accessed: 2018-05-03.Google Scholar

[196] Kumar, Ashish. Feature engineering: Data scientist’s secret sauce ! www.datasciencecentral.com/profiles/blogs/feature-engineering-data-scientist-s-secret-sauce-1, 2016. Accessed: 2018-06-11.Google Scholar

[197] Kurgan, Lukasz A. and Cios, Krzysztof J.. Caim discretization algorithm. IEEE Transactions on Knowledge and Data Engineering, 16(2):145–153, 2004.Google Scholar

[198] Lafferty, John D., McCallum, Andrew and Pereira, Fernando C. N.. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML’01, pages 282–289, San Francisco, CA: Morgan Kaufmann, 2001.Google Scholar

[199] Laherrere, Jean and Sornette, Didier. Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales. The European Physical Journal B-Condensed Matter and Complex Systems, 2(4):525–539, 1998.Google Scholar

[200] Lapata, Mirella. Invited keynote: Translating from multiple modalities to text and back. www.slideshare.net/aclanthology/mirella-lapata-2017-translating-from-multiple-modalities-to-text-and-back, 2017. Accessed: 2019-01-12.Google Scholar

[201] Lassila, Ora and Swick, Ralph R.. Resource description framework (RDF) model and syntax specification. www.w3.org/TR/REC-rdf-syntax, February 1999. Accessed: 2018-20-1.Google Scholar

[202] Lavrenko, Victor. Machine learning = Feature engineering. www.youtube.com/watch?v=CAnEJ42eEYA, 2016. Accessed: 2018-06-06.Google Scholar

[203] Lee, Daniel D. and Seung, H. Sebastian. Algorithms for non-negative matrix factorization. In Dietterich, T. G., Becker, S., and Ghahramani, Z. (eds.) Advances in neural information processing systems, pages 556–562, Cambridge, MA: MIT Press, 2001.Google Scholar

[204] Leek, Jeff. The Elements of Data Analytic Style. Victoria, British Columbia: Leanpub, 2015.Google Scholar

[205] Lehmann, Jens, Isele, Robert, Jakob, Max, Jentzsch, Anja, Kontokostas, Dimitris, Mendes, Pablo N., Hellmann, Sebastian, Morsey, Mohamed, van Kleef, Patrick Sören Auer, and Bizer, Christian. DBpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal, 6(2):167–195, 2015.Google Scholar

[206] Li, Chun-Liang. Feature engineering in machine learning. www.slideshare.net/tw_dsconf/feature-engineering-in-machine-learning, 2015. Accessed: 2018-06-11.Google Scholar

[207] Li, Chun-Liang, Lin, Hsuan-Tien and Lu, Chi-Jen. Rivalry of two families of algorithms for memory-restricted streaming pca. In Artificial Intelligence and Statistics, pages 473–481, 2016.Google Scholar

[208] Li, Ping, Hastie, Trevor J and Church, Kenneth W. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 287–296. ACM, 2006.Google Scholar

[209] Lin, Chih-Jen. Support vector machines and kernel methods: Status and challenges. www.csie.ntu.edu.tw/∼cjlin/talks/kuleuven_svm .pdf, 2013. Accessed: 2018-08-27.Google Scholar

[210] Lin, Tsung-Yi, Maire, Michael, Belongie, Serge, Hays, James, Perona, Pietro, Ramanan, Deva, Dollar, Piotr and Zitnick, Larry. Microsoft coco: Common objects in context. In ECCV. European Conference on Computer Vision, September 2014.Google Scholar

[211] Lin, Yijun, Chiang, Yao-Yi, Pan, Fan, Stripelis, Dimitrios, Ambite, José Luis, Eckel, Sandrah P and Habre, Rima. Mining public datasets for modeling intra-city pm2. Five concentrations at a fine spatial resolution. In Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems, page 25. New York, NY: ACM, 2017.Google Scholar

[212] Littlestone, Nick. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2(4):285–318, 1988.Google Scholar

[213] Liu, Huan, Hussain, Farhad, Tan, Chew Lim and Dash, Manoranjan. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393–423, 2002.CrossRef Google Scholar

[214] Liu, Huan and Motoda, Hiroshi. Computational Methods of Feature Selection. Boca Raton, FL: CRC Press, 2007.Google Scholar

[215] Lowe, David G.. Object recognition from local scale-invariant features. In Proc. of the International Conference on Computer Vision, Corfu, 1999.Google Scholar

[216] Lowe, David G.. Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image. US Patent US6711293B1, 2004.Google Scholar

[217] Mahalanobis, Prasanta Chandra. On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta), 2:49–55, 1936.Google Scholar

[218] Mané, Dandelion. Hands-on tensorBoard (TensorFlow Dev Summit 2017). www.youtube.com/watch?v=eBbEDRsCmv4, 2017. Accessed: 2018-11-30.Google Scholar

[219] Manning, Christopher D., Raghavan, Prabhakar and Schutze, Hinrich. Introduction to Information Retrieval. New York: Cambridge University Press, 2008.Google Scholar

[220] Christopher, D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.Google Scholar

[221] Markovsky, Ivan. Low Rank Approximation - Algorithms, Implementation, Applications. Communications and Control Engineering. New York, NY: Springer, 2012.Google Scholar

[222] Marr, Bernard. Twenty-Seven incredible examples of AI and Machine Learning in practice. Forbes sites: www.forbes.com/sites/bernardmarr/2018/04/30/27-incredible-examples-of-ai-and-machine-learning-in-practice/, 2018. Accessed: 2019-01-12.Google Scholar

[223] Marszalek, Marcin, Laptev, Ivan and Schmid, Cordelia. Actions in context. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2929–2936. San Diego, CA: IEEE, 2009.Google Scholar

[224] Macfarlane, Andrew Porter, Martin and Boulton, Richard. Snowball stop words list. http://snowball.tartarus.org/algorithms/english/stop.txt, 2001. Accessed: 2018-12-12.Google Scholar

[225] Mayer-Schonberger, Viktor and Cukier, Kenneth. Big Data: A Revolution That Will Transform How We Live, Work and Think. Boston, MA: Houghton Mifflin Harcourt, 2013.Google Scholar

[226] McCallum, Andrew, Ungar, Kamal Nigam Lyle H.. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’00, pages 169–178, New York, NY, ACM, 2000Google Scholar

[227] McCallum, Andrew, Schultz, Karl and Singh, Sameer. Factorie: Probabilistic programming via imperatively defined factor graphs. In Advances in Neural Information Processing Systems, pages 1249–1257, 2009.Google Scholar

[228] McCormick, C.. Word2vec tutorial part 2 - Negative sampling. Retrieved from www.mccormickml.com, 2017. Accessed: 2018-12-22.Google Scholar

[229] Melville, Prem Noel. Creating diverse ensemble classifiers to reduce supervision. PhD thesis, Department of Computer Sciences at the University of Texas at Austin, 2005.Google Scholar

[230] Mendes, Pablo N., Jakob, Max, García-Silva, Andrés and Bizer, Christian. Dbpedia spotlight: Shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, pages 1–8. San Diego, CA: ACM, 2011.Google Scholar

[231] Mercer, James. Xvi. functions of positive and negative type, and their connection the theory of integral equations. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 209(441-458):415–446, 1909.Google Scholar

[232] Mihalcea, Rada F. and Radev, Dragomir R.. Graph-Based Natural Language Processing and Information Retrieval. New York: Cambridge University Press, 1st edition, 2011.Google Scholar

[233] Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S and Dean, Jeff. Distributed representations of words and phrases and their compositionality. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., (eds.), Advances in Neural Information Processing Systems 26, pages 3111– 3119. Vancouver, British Columbia: Curran Associates, Inc., 2013.Google Scholar

[234] Mitchell, Tom M.. Machine Learning. New York: McGraw-Hill, 1997.Google Scholar

[235] Moindrot, Olivier and Genthial, Guillaume. Cs230 – Theory: How to choose the train, train-dev, dev and test sets. https://cs230-stanford.github.io/train-dev-test-split.html, 2018. Accessed: 2019-07-07.Google Scholar

[236] Montemurro, Marcelo A.. Beyond the zipf–mandelbrot law in quantitative linguistics. Physica A: Statistical Mechanics and its Applications, 300(3-4):567– 578, 2001.Google Scholar

[237] MacFarlane Mood, Alexander and klin, Fran Graybill, Arno. Introduction to the Theory of Statistics. International student edition. New York: McGraw-Hill, 2nd edition, 1963.Google Scholar

[238] Nathan Mundhenk, T., Goran Konjevod, Wesam A. Sakla and Kofi Boakye. A large contextual dataset for classification, detection and counting of cars with deep learning. In Leibe, Bastian, Matas, Jiri, Sebe, Nicu, and Welling, Max, (eds.), ECCV (3), volume 9907 of Lecture Notes in Computer Science, pages 785–800. New York, NY: Springer, 2016.Google Scholar

[239] Myatt, Glenn J. and Johnson, Wayne P.. Making sense of data i: A practical guide to exploratory data analysis and data mining, 2nd edition, Hoboken, NJ: John Wiley & Sons, 2014.Google Scholar

[240] Nadeau, David and Sekine, Satoshi. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3–26, 2007.Google Scholar

[241] Natekin, Alexey and Knoll, Alois. Gradient boosting machines, a tutorial. Front. Neurorobot., 7(21), 2013.Google Scholar

[242] Ng, Andrew. Advice for applying machine learning. https://see.stanford.edu/materials/aimlcs229/ML-advice.pdf, 2015. Accessed: 2018-20-1.Google Scholar

[243] Nickel, Maximilian, Murphy, Kevin, Tresp, Volker and Gabrilovich, Evgeniy. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2016.Google Scholar

[244] Nickel, Maximilian, Tresp, Volker and Kriegel, Hans-Peter. Factorizing yago: scalable machine learning for linked data. In Proceedings of the 21st international conference on World Wide Web, pages 271–280. New York, NY: ACM, 2012.Google Scholar

[245] Nixon, Mark and Aguado, Alberto S.. Feature Extraction and Image Processing for Computer Vision. Cambridge, MA: Academic Press, 2012.Google Scholar

[246] Nogueira, Sarah, Sechidis, Konstantinos and Brown, Gavin. On the stability of feature selection algorithms. The Journal of Machine Learning Research, 18(1):6345–6398, 2017.Google Scholar

[247] Ojala, Timo, Pietikäinen, Matti and Harwood, David. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1):51–59, 1996.Google Scholar

[248] Overton, Michael L.. Numerical computing with IEEE floating point arithmetic, Philadelphia, PA: Siam, 2001.Google Scholar

[249] Owen, Sean, Anil, Robin, Dunning, Ted and Friedman, Ellen. Mahout in Action. Shelter Island, NY: Manning Publications, 1st edition, 2011.Google Scholar

[250] Padilla, Afelio. Practical machine learning - 2.3 - Feature engineering. www.youtube.com/watch?v=78RUW9kuDe4, 2016, . Accessed: 2018-06-10.Google Scholar

[251] Kumar Pal, Ashwini. Diving deeper into dimension reduction with independent components analysis (ICA). https://blog.paperspace.com/dimension-reduction-with-independent-components-analysis/, 2017. Accessed: 2018-12-15.Google Scholar

[252] Pankratz, Alan. Forecasting with Dynamic Regression Models. Hoboken, NJ: Wiley, 1991.Google Scholar

[253] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E.. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.Google Scholar

[254] Peng, Hanchuan, Long, Fuhui and Ding, Chris. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis & Machine Intelligence, (8):1226–1238, 2005.Google Scholar

[255] Pennington, Jeffrey, Socher, Richard and Manning, Christopher D.. GloVe: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014.Google Scholar

[256] Perozzi, Bryan, Al-Rfou, Rami and Skiena, Steven. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. New York, NY: ACM, 2014.Google Scholar

[257] Peters, Matthew E., Neumann, Mark, Iyyer, Mohit, Gardner, Matt, Clark, Christopher, Lee, Kenton and Zettlemoyer, Luke. Deep contextualized word representations. In NAACL-HLT, pages 2227–2237. Stroudsburg, PA: Association for Computational Linguistics, 2018.Google Scholar

[258] Porter, M. F.. An algorithm for suffix stripping. In Sparck Jones, Karen and Willett, Peter, (eds.), Readings in Information Retrieval, pages 313–316. San Francisco, CA: Morgan Kaufmann Publishers, 1997.Google Scholar

[259] Pozdnoukhov, Alexei and Kanevski, Mikhail. Monitoring network optimisation for spatial data classification using support vector machines. International Journal of Environmental and Pollution, 28(3-4):465–484, 2006.Google Scholar

[260] Prechelt, L.. Proben1: A set of neural network benchmark problems and bench-marking rules. Technical Report 21/94, Univ., Fak. für Informatik, September 1994.Google Scholar

[261] Puk, Frank. How is ANOVA used for feature selection. www.quora.com/How-is-ANOVA-used-for-feature-selection, 2016. Accessed: 2018-08-27.Google Scholar

[262] Pustejovsky, James and Stubbs, Amber. Natural Language Annotation for Machine Learning - A Guide to Corpus-Building for Applications. Sebastopol, CA: O’Reilly, 2012.Google Scholar

[263] Pyle, Dorian. Data Preparation for Data Mining. San Francisco, CA: Morgan Kaufmann Publishers, 1999.Google Scholar

[264] Rahimi, Ali and Recht, Benjamin. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems, pages 1177– 1184, 2008.Google Scholar

[265] Rahimi, Ali and Recht, Benjamin. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. In Advances in Neural Information Processing Systems, pages 1313–1320, 2009.Google Scholar

[266] Rasmussen, Carl Edward and Williams, Christopher K. I.. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). Cambridge, MA: MIT Press, 2005.Google Scholar

[267] Ré, Christopher, Sadeghian, Amir Abbas, Shan, Zifei, Shin, Jaeho, Wang, Feiran, Wu, Sen and Zhang, Ce. Feature engineering for knowledge base construction. arXiv preprint arXiv:1407.6439, 2014.Google Scholar

[268] Reimers, Nils. Deep learning for nlp - Lecture 5 - convolutional neural networks. www.youtube.com/watch?v=nzSPZyjGlWI, 2015. Accessed 2018-20-1.Google Scholar

[269] Reiter, Ehud, Robertson, R. and Osman, Liesl. Knowledge acquisition for natural language generation. In Proceedings of the Fist International Conference on Natural Language Generation (INLG-2000), pages 217–224, 2000.Google Scholar

[270] Rigoutsos, Isidore and Floratos, Aris. Combinatorial pattern discovery in biological sequences: The teiresias algorithm. Bioinformatics (Oxford, England), 14(1):55–67, 1998.Google Scholar

[271] Ristoski, Petar and Paulheim, Heiko. Rdf2vec: Rdf graph embeddings for data mining. In Groth, Paul, Simperl, Elena, Gray, Alasdair, Sabou, Marta, Krötzsch, Markus, Lecue, Freddy, Flöck, Fabian, and Gil, Yolanda, (eds.), The Semantic Web – ISWC 2016, pages 498–514, Cham: Springer, 2016.Google Scholar

[272] Roberts, Stephen J.. Novelty detection using extreme value statistics. IEE Proceedings-Vision, Image and Signal Processing, 146(3):124–129, 1999.Google Scholar

[273] Robertson, Mark R.. 300+ hours of video uploaded to youtube every minute. https://tubularinsights.com/youtube-300-hours/, 2015. Accessed: 2019-01-10.Google Scholar

[274] Saeys, Yvan, Abeel, Thomas and Van de Peer, Yves. Robust feature selection using ensemble feature selection techniques. In Daelemans, Walter, Goethals, Bart, and Morik, Katharina, (eds.), ECML/PKDD (2), volume 5212 of Lecture Notes in Computer Science, pages 313–325. Cham: Springer, 2008.Google Scholar

[275] Sahlgren, Magnus. A brief history of word embeddings (and some clarifications). www.linkedin.com/pulse/brief-history-word-embeddings-some-clarifications-magnus-sahlgren/, 2015. Accessed: 2018-12-20.Google Scholar

[276] Schmatz, Steven. When should i use pca versus non-negative matrix factorization? www.quora.com/When-should-I-use-PCA-versus-non-negative-matrix-factorization, 2018. Accessed: 2018-12-18.Google Scholar

[277] Scholkopf, Bernhard and Smola, Alexander J.. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2001.Google Scholar

[278] Schutt, Rachel and O’Neil, Cathy. Doing Data Science: Straight Talk from the Frontline. Stroudsburg, PA: O’Reilly Media, Inc., 2013.Google Scholar

[279] Sechidis, Konstantinos. Hypothesis Testing and Feature Selection in Semi-Supervised Data. PhD thesis, School of Computer Science, University Of Manchester, UK, November 2015.Google Scholar

[280] Shafaei, Alireza, Little, James J. and Schmidt, Mark. Play and learn: Using video games to train computer vision models. In Proceedings of the British Machine Vision Conference BMVC, York, UK, September 2016.Google Scholar

[281] Shardlow, Matthew. An analysis of feature selection techniques. Technical report, The University of Manchester, 2016.Google Scholar

[282] Sharkey, Noel E., (ed.). Connectionist Natural Language Processing. Intellect, Oxford, England: Intellect, 1992.Google Scholar

[283] Shaw, Blake and Jebara, Tony. Structure preserving embedding. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 937– 944. New York, NY: ACM, 2009.Google Scholar

[284] Silberzahn, Raphael, Uhlmann, Eric L, Martin, Daniel P, Anselmi, Pasquale, Aust, Frederik, Awtrey, Eli, Bahník, Štěpán, Bai, Feng, Bannard, Colin, Bonnier, Evelina, Carlsson, R., Cheung, F., Christensen, G., Clay, R., Craig, M. A., Rosa, A. Dalla, Dam, L., Evans, M. H., Flores Cervantes, I., N. Fong, M. Gamez-Djokic, A. Glenz, S. Gordon-McKeon, T. J. Heaton, K. Hederos, M. Heene, A. J. Hofelich Mohr, F. Högden, K. Hui, M. Johannesson, J. Kalodimos, E. Kaszubowski, D. M. Kennedy, R. Lei, T. A. Lindsay, S. Liverani, C. R. Madan, D. Molden, E. Molleman, R. D. Morey, L. B. Mulder, B. R. Nijstad, N. G. Pope, B. Pope, J. M. Prenoveau, F. Rink, E. Robusto, H. Roderique, A. Sandberg, E. Schlüter, F. D. Schönbrodt, M. F. Sherman, S. A. Sommer, K. Sotak, S. Spain, C. Spörlein, T. Stafford, L. Stefanutti, S. Tauber, J. Ullrich, M. Vianello, E.-J. Wagenmakers, M. Witkowiak, S. Yoon, B. A. Nosek, Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3):337–356, 2018.Google Scholar

[285] Singhal, Amit, Buckley, Chris and Mitra, Manclar. Pivoted document length normalization. In ACM SIGIR Forum, volume 51, pages 176–184. New York, NY: ACM, 2017.Google Scholar

[286] Smit, Peter, Virpioja, Sami, Grönroos, Stig-Arne, Kurimo, Mikko. Morfessor 2.0: Toolkit for statistical morphological segmentation. In Proceedings of the Demonstrations at the 14th Conference of the [E]uropean Chapter of the Association for Computational Linguistics, pages 21–24. Gothenburg, Sweden: Aalto University, 2014.Google Scholar

[287] Snoek, Jasper, Larochelle, Hugo and Adams, Ryan P.. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2951–2959, 2012.Google Scholar

[288] Sobel, Irwin and Feldman, Gary. A 3x3 isotropic gradient operator for image processing. In Duda, R. and Hart, P., editors, Pattern Classification and Scene Analysis, pages 271–272. New York: Wiley, 1973.Google Scholar

[289] Thomas, Samuel Ganapathy, Sriram. Tutorial: The art and science of speech feature engineering. www.clsp.jhu.edu/∼samuel/pdfs/tutorial.pdf, 2014. Accessed: 2018-06-11.Google Scholar

[290] Srivastava, Tavish. A complete tutorial on time series modeling in r. www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/, 2015. Accessed: 2019-01-15.Google Scholar

[291] Contributors StackExchange. What is the difference between ZCA whitening and PCA whitening? https://stats.stackexchange.com/questions/117427/what-is-the-difference-between-zca-whitening-and-pca-whitening, 2014. Accessed: 2018-09-15.Google Scholar

[292] Contributors StackExchange. Feature map for the gaussian kernel? https://stats.stackexchange.com/questions/69759/feature-map-for-the-gaussian-kernel, 2015. Accessed: 2018-08-27.Google Scholar

[293] Contributors StackExchange. Relationship between svd and pca. how to use svd to perform pca? https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca, 2015. Accessed: 2018-09-10.Google Scholar

[294] Contributors StackExchange. Deriving feature map formula for inhomogeneous polynomial kernel. https://datascience.stackexchange.com/questions/10534/deriving-feature-map-formula-for-inhomogeneous-polynomial-kernel, 2016. Accessed: 2019-01-15.Google Scholar

[295] Contributors StackExchange. How to perform feature engineering on unknown features? http://datascience.stackexchange.com/questions/10640/how-to-perform-feature-engineering-on-unknown-features, 2016. Accessed: 2018-06-11.Google Scholar

[296] Contributors StackExchange. How does lda (latent dirichlet allocation) assign a topic-distribution to a new document? https://stats.stackexchange.com/questions/325614/how-does-lda-latent-dirichlet-allocation-assign-a-topic-distribution-to-a-new, 2018. Accessed: 2018-09-9.Google Scholar

[297] Steger, Carsten. Machine vision algorithms. In Hornberg, Alexander, (ed.), Handbook of Machine and Computer Vision: The Guide for Developers and Users, Second, Revised and Updated Edition, chapter 9. Hoboken, NJ: Wiley, 2017.Google Scholar

[298] Sterne, Jonathan A. C., White, Ian R., Carlin, John B., Spratt, Michael, Royston, Patrick, Kenward, Michael G., Wood, Angela M. and Carpenter, James R.. Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ, 338:b2393, 2009.Google Scholar

[299] Stoppiglia, Hervé, Dreyfus, Gérard, Dubois, Rémi and Oussar, Yacine. Ranking a random feature for variable and feature selection. Journal of Machine Learning Research, 3(Mar):1399–1414, 2003.Google Scholar

[300] Suchanek, Fabian M., Kasneci, Gjergji and Weikum, Gerhard. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW’07, pages 697–706, New York, NY, USA, 2007. New York, NY: ACM.Google Scholar

[301] Swain, Michael J. and Ballard, Dana H.. Indexing via color histograms. In Sood, Arun K. and Wechsler, Harry (eds.), Active Perception and Robot Vision, pages 261–273. New York, NY: Springer, 1992.Google Scholar

[302] Takurita, Kei. Paper dissected: “GloVe global vectors for word presentation” explained. http://mlexplained.com/2018/04/29/paper-dissected-glove-global-vectors-for-word-representation-explained/, 2018. Accessed: 2018-12-22.Google Scholar

[303] Talbot, David and Brants, Thorsten. Randomized language models via perfect hash functions. Proceedings of ACL-08: HLT, pages 505–513, 2008.Google Scholar

[304] Tang, Lappoon R. and Mooney, Raymond J.. Using multiple clause constructors in inductive logic programming for semantic parsing. In European Conference on Machine Learning, pages 466–477. New York, NY: Springer, 2001.Google Scholar

[305] Tibshirani, Robert. Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3):273–282, 2011.Google Scholar

[306] Töscher, Andreas, Jahrer, Michael and Bell, Robert M. The bigchaos solution to the Netflix Grand Prize. Report from the Netflix Prize Winners, 2009.Google Scholar

[307] Tovbin, Matthew. Meet TransmogrifAI, open source AutoML that powers einstein predictions. SF Big Analytics Meetup. www.youtube.com/watch?v=93vsqjfGPCw&feature=youtu.be&t=2800, 2019. Accessed: 2019-7-13.Google Scholar

[308] Tschirsich, Martin and Hintz, Gerold. Leveraging crowdsourcing for paraphrase recognition. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pages 205–213, Sofia, Bulgaria, August 2013. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

[309] Tusher, Virginia Goss, Tibshirani, Robert and Chu, Gilbert. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 98(9):5116–5121, 2001.Google Scholar

[310] Ursula, Franklin. Real World of Technology. Toronto, Ontario: House of Anansi Press, 1989.Google Scholar

[311] van der Maaten, Laurens and Hinton, Geoffrey. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605, 2008.Google Scholar

[312] Hinsbergh, James Van, Griffiths, Nathan, Taylor, Phillip, Thomason, Alasdair, Xu, Zhou and Mouzakitis, Alex. Vehicle point of interest detection using in-car data. In Proceedings of the 2Nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI’18, pages 1–4, Seattle, WA, USA, 2018. New York, NY: ACM.Google Scholar

[313] Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N, Kaiser, Łukasz and Polosukhin, Illia. Attention is all you need. In Advances in Neural Information Process Systems, pages 5998–6008, 2017.Google Scholar

[314] Venkatesan, Ragav, Gatupalli, Vijetha and Li, Baoxin. On the generality of neural image features. In 2016 IEEE International Conference on Image Processing (ICIP), pages 41–45. San Diego, CA: IEEE, 2016.Google Scholar

[315] Vens, Celine and Costa, Fabrizio. Random forest based feature induction. In 2011 IEEE 11th International Conference on Data Mining, pages 744–753. San Diego, CA: IEEE, 2011.Google Scholar

[316] Verborgh, Ruben and Wilde, Max De. Using OpenRefine. Birmingham, UK: Packt Publishing, 1st edition, 2013.Google Scholar

[317] Vincent, Pascal, Larochelle, Hugo, Bengio, Yoshua and Manzagol, Pierre-Antoine. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103. New York, NY: ACM, 2008.Google Scholar

[318] Vichy, S. Vishwanathan, N., Schraudolph, Nicol N., Kondor, Risi and Borgwardt, Karsten M.. Graph kernels. Journal of Machine Learning Research, 11(Apr):1201–1242, 2010.Google Scholar

[319] VoPham, Trang, Hart, Jaime E., Laden, Francine and Chiang, Yao-Yi. Emerging trends in geospatial artificial intelligence (geoai): Potential applications for environmental epidemiology. Environmental Health, 17(1):40, Apr 2018.Google Scholar

[320] Weisstein, Eric W.. Frobenius norm. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/FrobeniusNorm.html, 2018. Accessed: 2018-12-17.Google Scholar

[321] Wen, Tsung-Hsien, Gasic, Milica, Mrksic, Nikola, Su, Pei hao, Vandyke, David and Young, Steve J.. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In EMNLP, pages 1711–1721. Stroudsburg, PA: The Association for Computational Linguistics, 2015.Google Scholar

[322] Wendlandt, Laura, Kummerfeld, Jonathan K and Mihalcea, Rada. Factors influencing the surprising instability of word embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 2092–2102, 2018.Google Scholar

[323] Wick, Marc. Geonames ontology. www.geonames.org/about.html, 2015. Accessed: 2015-04-22.Google Scholar

[324] Wickham, Hadley. Tidy data. Journal of Statistical Software, 59(1):1–23, 2014.Google Scholar

[325] Wikipedia. Akaike information criterion. https://en.wikipedia.org/wiki/Akaike_information_criterion, 2018. Accessed: 2018-06-04.Google Scholar

[326] Wikipedia. List of common coordinate transformations. https://en.wikipedia.org/wiki/List_of_common_coordinate_ transformations, 2018. Accessed: 2018-08-27.Google Scholar

[327] Wikipedia. Multinomial theorem. https://en.wikipedia.org/wiki/Multinomial_theorem, 2018. Accessed: 2018-08-27.Google Scholar

[328] Wikipedia. Multivariate adaptive regression splines. https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_ splines, 2019. Accessed: 2019-01-10.Google Scholar

[329] Wikipedia. tf-idf. https://en.wikipedia.org/wiki/Tf-idf, 2019. Accessed: 2018-01-10.Google Scholar

[330] Wilbur, W. John and Sirotkin, Karl. The automatic identification of stop words. Journal of Information Science, 18(1):45–55, 1992.Google Scholar

[331] Witten, Ian H. and Frank, Eibe. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco, CA: Morgan Kaufmann Publishers, 2000.Google Scholar

[332] Wolpert, David H. and Macready, William G.. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997.Google Scholar

[333] Wong, Yuk Wah and Mooney, Raymond J. Learning for semantic parsing with statistical machine translation. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 439–446. Stroudsburg, PA: Association for Computational Linguistics, 2006.Google Scholar

[334] Xindong, Wu, Yu, Kui, Wang, Hao and Ding, Wei. Online streaming feature selection. In Proceedings of the 27th International Conference on Machine Learning, pages 1159–1166, 2010.Google Scholar

[335] Wurtz, Robert H.. Recounting the impact of Hubel and Wiesel. J Physiol, 587(12):2817–2823, 2009.Google Scholar

[336] Xiang, Jingen. Scalable scientific computing algorithms using mapreduce. Master’s thesis, University of Waterloo, 2013.Google Scholar

[337] Yan, Weizhong. Feature engineering for PHM applications. www.phmsociety.org/sites/phmsociety.org/files/FeatureEngineeringTut orial_2015PHM_V2.pdf, oct 2015. Accessed: 2018-05-01, linked from www.phmsociety.org/events/conference/phm/15/tutorials.Google Scholar

[338] Rao Yarlagadda, R. K.. Analog and Digital Signals and Systems, volume 1, chapter 2. New York, NY: Springer, 2010.Google Scholar

[339] Ying, Annie T. T., Murphy, Gail C., Ng, Raymond and Chu-Carroll, Mark C.. Predicting source code changes by mining change history. IEEE Trans. Software Engineering, 30(9):574–586, September 2004.Google Scholar

[340] John Yu, Chun-Nam and Joachims, Thorsten. Learning structural svms with latent variables. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1169–1176. New York, NY: ACM, 2009.Google Scholar

[341] Yu, Hsiang-Fu, Lo, Hung-Yi, Hsieh, Hsun-Ping, Lou, Jing-Kai, McKenzie, Todd G, Chou, Jung-Wei, Chung, Po-Han, Ho, Chia-Hua, Chang, Chun-Fu, Wei, Yin-Hsuan, Weng, Jui-Yu, Yan, En-Syu, Chang, Che-Wei, Kuo, Tsung-Ting, Lo, Yi-Chen, Chang, Po Tzu, Po, Chieh, Wang, Chien-Yuan, Huang, Yi-Hung, Hung, Chen-Wei, Ruan, Yu-Xun, Lin, Yu-Shi, Lin, Shou-de, Lin, Hsuan-Tien, and Lin, Chih-Jen. Feature engineering and classifier ensemble for kdd cup 2010. In KDD Cup, 2010.Google Scholar

[342] Zabokrtsky, Zdenek. Feature engineering in machine learning. https://ufal.mff.cuni.cz/∼zabokrtsky/courses/npfl104/html/feature_engineering.pdf, 2014. Accessed: 2018-06-10.Google Scholar

[343] Zaharia, Matei, Chowdhury, Mosharaf, Franklin, Michael J., Shenker, Scott and Stoica, Ion. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pages 10–10, Berkeley, CA, USA, 2010. Berkeley, CA: USENIX Association.Google Scholar

[344] Zhang, Ce, Kumar, Arun and Ré, Christopher. Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems (TODS), 41(1):2, 2016.Google Scholar

[345] Zhang, Sheng, Wang, Weihong, Ford, James and Makedon, Fillia. Learning from incomplete ratings using non-negative matrix factorization. In Proceedings of the 2006 SIAM International Conference on Data Mining, pages 549–553. Philadelphia, PA: SIAM, 2006.Google Scholar

[346] Zhang, Tian, Ramakrishnan, Raghu and Livny, Miron. Birch: An efficient data clustering method for very large databases. In ACM Sigmod Record, volume 25, pages 103–114. New York, NY: ACM, 1996.Google Scholar

[347] Zhao, Kang, Lu, Hongtao and Mei, Jincheng. Locality preserving hashing. In Brodley, Carla E. and Stone, Peter, (eds.), AAAI, pages 2874–2881. Cambridge, MA: AAAI Press, 2014.Google Scholar

[348] Zhao, Shenglin, Lyu, Michael R. and King, Irwin. Point-of-Interest Recommendation in Location-Based Social Networks. SpringerBriefs in Computer Science. Singapore: Springer Singapore, 2018.Google Scholar

[349] Zhao, Shichen, Yu, Tao, Meng, Qingyan, Zhou, Qiming, Wang, Feifei, Wang, Li and Yueming, Hu. Gdal-based extend arcgis engine’s support for hdf file format. In Geoinformatics, pages 1–3. San Diego, CA: IEEE, 2010.Google Scholar

[350] Zheng, Alice. Evaluating Machine Learning Models: A Beginner’s Guide to Key Concepts and Pitfalls. Sebastopol, CA: O’Reilly, 2015.Google Scholar

[351] Zheng, Alice. Mastering Feature Engineering. Sebastopol, CA: O’Reilly Early Access, 2016.Google Scholar

[352] Zheng, Alice and Casari, Amanda. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. Sebastopol, CA: O’Reilly Media, Inc., 2018.Google Scholar

[353] Zou, Hui and Hastie, Trevor. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.Google Scholar

Book contents

Bibliography

Summary

Access options

References

Book contents

Bibliography

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive