Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-cjp7w Total loading time: 0 Render date: 2024-06-17T17:05:27.343Z Has data issue: false hasContentIssue false

6 - Implementing Classification in Weka and R

Published online by Cambridge University Press:  26 April 2019

Parteek Bhatia
Affiliation:
Thapar University, India
Get access

Summary

Chapter Objectives

✓ To demonstrate the use of the decision tree

✓ To apply the decision tree on a sample dataset

✓ To implement a decision tree process using Weka and R

Building a Decision Tree Classifier in Weka

In this chapter, we will learn how Weka's decision tree feature helps to classify unknown samples of a dataset based on its attribute values. When Weka's decision tree is applied to an unknown sample, the decision tree classifies the sample into different classes such as Class A, Class B and Class C as shown in Figure 6.1.

For example, if we want to predict the class of an unknown sample of a flower based on the length and width dimensions of its Sepal and Petal. The first step would be to measure Sepal length and width and Petal length and width of an unknown flower and compare these dimensions to the values of the samples in our dataset of known species. The decision tree algorithm of Weka will help in creating decision rules to predict the class of unknown flower automatically as shown in Figure 6.2.

As shown in Figure 6.2, the dimensions of an unknown sample of flower will be matched with the rules generated by the decision tree. First, the rules will be matched to determine whether the sample belongs to Setosa class or not, if yes, the unknown sample will be classified as setosa. If not, the unknown sample will be checked for being of the Virginica class. If it matches with the conditions of the Virginica class, it will be labeled as Virginica, otherwise Versicolor. It is important to note that it would not be simple to create these rules on the basis of the values of single attribute as shown in Table 6.1. It is clear that for the same Sepal width, the flower may be of Setosa or Versicolor or Virginica, making it unclear which species an unknown flower belongs to on the basis of Sepal width alone. Thus, the decision tree must make its prediction based on all four flower dimensions.

Due to such overlaps, the decision tree cannot predict with 100% accuracy the class of flower, but can only determine the likelihood of an unknown sample belonging to a particular class. In real situations the decision tree algorithm works on the basis of probability.

Type
Chapter
Information
Data Mining and Data Warehousing
Principles and Practical Techniques
, pp. 128 - 154
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×