Distance-Based Modularity Analysis

Aidong Zhang

doi:10.1017/CBO9780511626593.008

INTRODUCTION

The classic approaches to clustering follow a protocol termed “pattern proximity after feature selection” [158]. Pattern proximity is usually measured by a distance function defined for pairs of patterns. A simple distance measurement can capture the dissimilarity between two patterns, while similarity measures can be used to characterize the conceptual similarity between patterns. In protein-protein interaction (PPI) networks, proteins are represented as nodes and interactions are represented as edges. The relationship between two proteins is therefore a simple binary value: 1 if they interact, 0 if they do not. This lack of nuance makes it difficult to define the distance between the two proteins. The reliable clustering of PPI networks is further complicated by a high rate of false positives and the sheer volume of data, as discussed in Chapter 2.

Distance-based clustering employs these classic techniques and focuses on the definition of the topological or biological distance between proteins. These clustering approaches begin by defining the distance or similarity between two proteins in the network. This distance/similarity matrix can then be incorporated into traditional clustering algorithms. In this chapter, we will discuss a variety of approaches to distance-based clustering, all of which are grounded upon the use of these classic techniques.

TOPOLOGICAL DISTANCE MEASUREMENT BASED ON COEFFICIENTS

The simplest of these approaches use classic distance measurement methods and their various coefficient formulas to compute the distance between proteins in PPI networks. As discussed in [123], the distance between two nodes (proteins) in a PPI network can be defined as follows.

Book contents

7 - Distance-Based Modularity Analysis

Summary

Access options

Book contents

7 - Distance-Based Modularity Analysis

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive