Home
• Get access
• Print publication year: 2016
• Online publication date: July 2016

# 2 - Similarity/Proximity Measures between Nodes

## Summary

Introduction

This chapter is concerned with the similarity and its dual, dissimilarity, between nodes of a graph. The need to quantify the similarity between objects arises in many situations, not only in network analysis. Indeed, similarity has been an important and widely used concept in many fields of research for years.

Having its origins in, among others, psychology in the work of Gustav Fechner of the 1860s, the concept of similarity has evolved over the years, as many similarity measures have been proposed in various fields such as feature contrast models [778], mutual information [384], cosine coefficients [289], and information content [666] (see [212] for a survey). The core idea behind a similarity measure is to exploit relevant information for determining the extent to which two objects are similar or not in some sense [212, 688, 761]. The simple intuitions behind the concept of similarity are summarized by Lin in [535]:

▸ The similarity between two objects is related to their commonality. The more commonality they share, the more similar they are.

▸ Symmetrically, the similarity between two objects is related to the differences between them. The more differences they have, the less similar they are.

▸ The maximum similarity between two objects is reached when the two objects are identical, no matter how much commonality they share.

Notice, however, that some popular similarity measures do not satisfy all of them. For instance, inner product similarity does not meet the third condition, unless it is normalized (in which case it is equivalent to cosine similarity).

To measure the similarity between nodes of a graph, two complementary sources of information can be used:

▸ the features (or attributes) of the nodes, or

▸ the structure of the graph

The former refers to the fact that two nodes of the graph are considered to be similar if they share many common features, while the latter refers to the fact that two nodes of the graph are considered to be similar if they are “structurally close” in some sense in the network. Both kinds of information can be combined, of course.