Discretization as the enabling technique for the Naïve Bayes and semi-Naïve Bayes-based classification

Marcin J. Mizianty; Lukasz A. Kurgan; Marek R. Ogiela

doi:10.1017/S0269888910000329

Discretization as the enabling technique for the Naïve Bayes and semi-Naïve Bayes-based classification

Published online by Cambridge University Press: 01 December 2010

Marcin J. Mizianty ,

Lukasz A. Kurgan and

Marek R. Ogiela

Show author details

Marcin J. Mizianty*: Affiliation:
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada; e-mail: mizianty@ualberta.ca, lkurgan@ece.ualberta.ca
Lukasz A. Kurgan*: Affiliation:
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada; e-mail: mizianty@ualberta.ca, lkurgan@ece.ualberta.ca
Marek R. Ogiela*: Affiliation:
Bio-cybernetics Laboratory, Institute of Automatics, AGH University of Science and Technology, Krakow, Poland; e-mail: mogiela@agh.edu.pl

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Current classification problems that concern data sets of large and increasing size require scalable classification algorithms. In this study, we concentrate on several scalable, linear complexity classifiers that include one of the top 10 voted data mining methods, Naïve Bayes (NB), and several recently proposed semi-NB classifiers. These algorithms perform front-end discretization of the continuous features since by design they work only with nominal or discrete features. We address the lack of studies that investigate the benefits and drawbacks of discretization in the context of the subsequent classification. Our comprehensive empirical study considers 12 discretizers (two unsupervised and 10 supervised), seven classifiers (two classical NB and five semi-NB), and 16 data sets. We investigate the scalability of the discretizers and show that the fastest supervised discretizers fast class-attribute interdependency maximization (FCAIM), class-attribute interdependency maximization (CAIM), and information entropy maximization (IEM) provide discretization schemes with the highest overall quality. We show that discretization improves the classification accuracy when compared against the two classical methods, NB and Flexible Naïve Bayes (FNB), executed on the raw data. The choice of the discretization algorithm impacts the significance of the improvements. The MODL, FCAIM, and CAIM methods provide statistically significant improvements, while the IEM, Class-attribute contingency coefficient (CACC), and Khiops discretizers provide moderate improvements. The most accurate classification models are generated by the Averaged one-dependence estimators (AODEsr) classifier followed by AODE and HNB (Hidden Naïve Bayes). AODEsr run on data discretized with MODL, FCAIM, and CAIM provides statistically significantly better accuracies than both the classical NB methods. The worst results are obtained with the NB, FNB, and LBR (Lazy Bayes rule) classifiers. We show that although the time to build the discretization scheme could be longer than the time to train the classifier, the completion of the entire process (to discretize data, compute the classifier, and predict test instances) is often faster than the NB-based classification of the continuous instances. This is because the time to classify test instances is an important factor that is positively influenced by discretization. The biggest positive influence, both on the accuracy and the classification time, is associated with the MODL, FCAIM, and CAIM algorithms.

Type: Articles
Information: The Knowledge Engineering Review , Volume 25 , Issue 4 , December 2010 , pp. 421 - 449

DOI: https://doi.org/10.1017/S0269888910000329 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abraham, R., Simha, J. B., Iyengar, S. S. 2006. A comparative analysis of discretization methods for medical datamining with naïve bayesian classifier. In ICIT ’06: Proceedings of the 9th International Conference on Information Technology. IEEE Computer Society, Washington, DC, USA, 235–236.CrossRef Google Scholar

Alpaydin, E. 1999. Combined 5×2 cv f test for comparing supervised classification learning algorithms. Neural Computation 11(8), 1885–1892.CrossRef Google Scholar

Asuncion, A., Newman, D. 2007. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar

Boullé, M. 2004. Khiops: a statistical discretization method of continuous attributes. Machine Learning 55(1), 53–69.CrossRef Google Scholar

Boullé, M. 2006. MODL: a bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165.CrossRef Google Scholar

Catlett, J. 1991. On changing continuous attributes into ordered discrete attributes. In EWSL ‘91: Proceedings of the European Working Session on Machine Learning. Springer-Verlag, London, UK, 164–178.Google Scholar

Ching, J., Wong, A., Chan, K. 1995. Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Transactions Pattern Analysis and Machine Intelligence 17(7), 641–651.CrossRef Google Scholar

Cios, K. J., Kurgan, L. A. 2002. Hybrid Inductive Machine Learning: An Overview of CLIP Algorithms. Physica-Verlag GmbH.Google Scholar

Cios, K. J., Kurgan, L. A. 2004. Clip4: hybrid inductive machine learning algorithm that generates inequality rules. Information Sciences 163(1–3), 37–83.CrossRef Google Scholar

Clark, P., Niblett, T. 1989. The cn2 induction algorithm. Machine Learning 3(4), 261–283.CrossRef Google Scholar

Demšar, J. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30.Google Scholar

Dougherty, J., Kohavi, R., Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proceedings of 12th International Conference Machine Learning. Morgan Kaufmann, 194–202.Google Scholar

Fayyad, U. M., Irani, K. B. 1992. On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8(1), 87–102.CrossRef Google Scholar

Fayyad, U. M., Irani, K. B. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the International Joint Conference on Uncertainty in AI. Morgan Kaufmann, San Francisco, CA, USA, 1022–1027.Google Scholar

Flores, J. L., Inza, I., Larran, P. 2007. Wrapper discretization by means of estimation of distribution algorithms. Intelligent Data Analysis 11(5), 525–545.CrossRef Google Scholar

Friedman, N., Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P. 1997. Bayesian network classifiers. Machine Learning 29, 131–163.CrossRef Google Scholar

Huang, W. 1996. Discretization of Continuous Attributes for Inductive Machine Learning. MSc thesis. Department of Computer Science, University of Toledo, Ohio, USA.Google Scholar

Iman, R. L., Davenport, J. M. 1980. Approximations of the critical region of the Friedman statistic. Communications in Statistics A9, 571–595.CrossRef Google Scholar

Jiang, L., Zhang, H. 2006. Weightily averaged one-dependence estimators. In Proceedings of the 9th Biennial Pacific Rim International Conference on Artificial Intelligence. Morgan Kaufmann, 970–974.Google Scholar

John, G., Langley, P. 1995. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 338–345.Google Scholar

Kaufman, K. A., Michalski, R. S. 1999. Learning from inconsistent and noisy data: the aq18 approach. In Proceedings of the 11th International Symposium Methodologies for Intelligent Systems, Saratoga Springs, NY, May 2005.Google Scholar

Keogh, E. J., Pazzani, M. J. 1999. Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In Proceedings of The Seventh International Workshop on Artificial Intelligence and Statistics, Fort Lauderdale, Florida, USA, 3–6 January, 225–230.Google Scholar

Kerber, R. 1992. Chimerge: discretization of numeric attributes. In Proceedings of the 9th International Conference of Artificial Intelligence, Cambridge, UK, 20–22 February, 123–128.Google Scholar

Kohavi, R., Sahami, M. 1996. Error-based and entropy-based discretization of continuous features. In Proceedings of the 2nd International Conference Knowledge Discovery and Data Mining, Portland, Oregon, USA, 2–4 August, 114–119.Google Scholar

Kujala, J., Elomaa, T. 2007. Improved algorithms for univariate discretization of continuous features. In PKDD 2007: Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, 17–21 September, 188–199.Google Scholar

Kurgan, L. A., Cios, K. J. 2001. Discretization algorithm that uses class-attribute interdependence maximization. In Proceedings of the 2001 International Conference on Artificial Intelligence, Seattle, Washington, USA, 4–10 August, 980–987.Google Scholar

Kurgan, L. A., Cios, K. J. 2003. Fast class-attribute interdependence maximization (CAIM) discretization algorithm. In Proceeding of International Conference on Machine Learning and Applications, Los Angeles, California, USA, 23–24 June, 30–36.Google Scholar

Kurgan, L. A., Cios, K. J. 2004. CAIM discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16, 145–153.CrossRef Google Scholar

Kurgan, L. A., Cios, K. J., Dick, S. 2006. Highly scalable and robust rule learner: performance evaluation and comparison. IEEE Transactions on Systems, Man, and Cybernetics, Part B 36(1), 32–53.CrossRef Google Scholar PubMed

Langley, P., Iba, W., Thompson, K. 1992. An analysis of bayesian classifiers. In Proceedings of the Tenth Conference on Artificial Intelligence. MIT Press, 223–228.Google Scholar

Langley, P., Sage, S. 1994. Induction of selective bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 399–406.Google Scholar

Lee, C.-H. 2007. A hellinger-based discretization method for numeric attributes in classification learning. Knowledge-Based Systems 20(4), 419–425.CrossRef Google Scholar

Liu, H., Setiono, R. 1997. Feature selection via discretization. IEEE Transactions on Knowledge and Data Engineering 9, 642–645.Google Scholar

Liu, H., Hussain, F., Tan, C., Dash, M. 2002. Discretization: an enabling technique. Data Mining and Knowledge Discovery 6(4), 393–423.CrossRef Google Scholar

Liu, X., Wang, H. 2005. A discretization algorithm based on a heterogeneity criterion. IEEE Transactions on Knowledge and Data Engineering 17(9), 1166–1173.CrossRef Google Scholar

Mehta, S., Parthasarathy, S., Yang, H. 2005. Toward unsupervised correlation preserving discretization. IEEE Transactions on Knowledge and Data Engineering 17(9), 1174–1185.CrossRef Google Scholar

Mizianty, M. J., Kurgan, L. A., Ogiela, M. R. 2008. Comparative analysis of the impact of discretization on the classification with naïve bayes and semi-naïve bayes classifiers. In ICMLA ’08: Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications, San Diego, California, USA, 11–13 December, 823–828.Google Scholar

Nemenyi, P. 1963. Distribution-free Multiple Comparisons. PhD thesis, Princeton University.Google Scholar

Paterson, A., Niblett, T. 1987. Acls Manual. Intelligent Terminals, Ltd.Google Scholar

Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.Google Scholar

Rissanen, J. 1978. Modeling by shortest data description. Automatica 14, 465–471.CrossRef Google Scholar

Tay, F. E. H., Shen, L. 2002. A modified chi2 algorithm for discretization. IEEE Transactions on Knowledge and Data Engineering 14(3), 666–670.CrossRef Google Scholar

Tsai, C.-J., Lee, C.-I., Yang, W.-P. 2008. A discretization algorithm based on class-attribute contingency coefficient. Information Sciences 178(3), 714–731.CrossRef Google Scholar

Wang, K., Liu, B. 1998. Concurrent discretization of multiple attributes. In Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence, Singapore, 22–27 November, 250–259.Google Scholar

Wang, Z., Webb, G. I. 2002. Comparison of lazy bayesian rule and tree-augmented bayesian learning. In Proceedings of the 2002 IEEE International Conference on Data Mining, IEEE Computer Society. Washington, DC, USA, 490.Google Scholar

Webb, G. I., Boughton, J. R., Wang, Z. 2005. Not so naïve bayes: aggregating one-dependence estimators. Machine Learning 58, 5–24.CrossRef Google Scholar

Winter, R., Auerbach, K. 2004. Contents under Pressure. Intelligent Enterprise. http://www.intelligententerprise.com/showArticle.jhtml;?articleID=18902161.Google Scholar

Witten, I. H., Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishers Inc.Google Scholar

Wong, A. K. C., Chiu, D. K. Y. 1987. Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 9(6), 796–805.CrossRef Google Scholar PubMed

Wong, A. K. C., Liu, T. S. 1975. Typicality, diversity, and feature pattern of an ensemble. IEEE Transactions on Computers 24(2), 158–181.CrossRef Google Scholar

Wu, X., Kumar, V., Ross, Q. J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., Steinberg, D. 2007. Top 10 algorithms in data mining. Knowledge and Information Systems 14(1), 1–37.CrossRef Google Scholar

Yang, Y., Webb, G. I. 2002. A comparative study of discretization methods for naive-bayes classifiers. In Proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop, Tokyo, Japan, 18–19 August, 159–173.Google Scholar

Zhang, H., Jiang, L., Su, J. 2005. Hidden naive bayes. In Proceedings of the 20th National Conference on Artificial intelligence. AAAI Press, 919–924.Google Scholar

Zheng, Z., Webb, G. I. 2000. Lazy learning of bayesian rules. Machine Learning 41(1), 53–84.CrossRef Google Scholar

Zheng, F., Webb, G. I. 2006. Efficient lazy elimination for averaged one-dependence estimators. In Proceedings of the 23rd international conference on Machine learning. ACM, 1113–1120.Google Scholar

Article contents

Discretization as the enabling technique for the Naïve Bayes and semi-Naïve Bayes-based classification

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests