Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-4hhp2 Total loading time: 0 Render date: 2024-05-02T11:55:39.203Z Has data issue: false hasContentIssue false

8 - Implementing Clustering with Weka and R

Published online by Cambridge University Press:  26 April 2019

Parteek Bhatia
Affiliation:
Thapar University, India
Get access

Summary

Chapter Objectives

✓ To apply the K-means algorithm in Weka and R language

✓ To interpret the results of clustering

✓ To identify the optimum number of clusters

✓ To apply classification on un-labeled data by using clustering as an intermediate step

Introduction

As discussed earlier, if data is not labeled then we can analyze this data by performing a clustering analysis, where clustering refers to the task of grouping a set of objects into classes of similar objects.

In this chapter, we will apply clustering on Fisher's Iris dataset. We will use clustering algorithms to group flower samples into clusters with similar flower dimensions. These clusters then become possible ways to group flowers samples into species. We will implement a simple k-means algorithm to cluster numerical attributes with the help of Weka and R.

In the case of classification, we know the attributes and classes of instances. For example, the flower dimensions and classes were already known to us for the Iris dataset. Our goal was to predict the class of an unknown sample as shown in Figure 8.1.

Earlier, we used the Weka J48 classification algorithm to build a decision tree on Fisher's Iris dataset using samples with known class, which helped in predicting the class of unknown samples. We used the flower's Sepal length and width, and the Petal length and width as the specific attributes for this. Based on flower dimensions and using this tree, we can identify an unknown Iris as one of three species, Setosa, Versicolor, and Virginica.

In clustering, we know the attributes for the instances, but we don't know the classes. For example, we know the flower dimensions for samples of the Iris dataset but we don't know what classes exist as shown in Figure 8.2. Therefore, our goal is to group instances into clusters with similar attributes or dimensions and then identify the class.

In this chapter, we will learn what happens if we don't know what classes the samples belong to, or even how many classes there are, or even what defines a class? Since, Fisher's Iris dataset is already labeled, we will first make this dataset unlabeled by removing the class attribute, i.e., the species column. Then, we will apply clustering algorithms to cluster this data on the basis of its input attributes, i.e., Sepal length, Sepal width, Petal length, and Petal width.

Type
Chapter
Information
Data Mining and Data Warehousing
Principles and Practical Techniques
, pp. 206 - 228
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×