Book contents
- Frontmatter
- Contents
- Preface
- I Introduction to Text Mining
- II Core Text Mining Operations
- III Text Mining Preprocessing Techniques
- IV Categorization
- V Clustering
- VI Information Extraction
- VII Probabilistic Models for Information Extraction
- VIII Preprocessing Applications Using Probabilistic and Hybrid Approaches
- IX Presentation-Layer Considerations for Browsing and Query Refinement
- X Visualization Approaches
- XI Link Analysis
- XII Text Mining Applications
- Appendix A DIAL: A Dedicated Information Extraction Language for Text Mining
- Bibliography
- Index
XI - Link Analysis
Published online by Cambridge University Press: 08 August 2009
- Frontmatter
- Contents
- Preface
- I Introduction to Text Mining
- II Core Text Mining Operations
- III Text Mining Preprocessing Techniques
- IV Categorization
- V Clustering
- VI Information Extraction
- VII Probabilistic Models for Information Extraction
- VIII Preprocessing Applications Using Probabilistic and Hybrid Approaches
- IX Presentation-Layer Considerations for Browsing and Query Refinement
- X Visualization Approaches
- XI Link Analysis
- XII Text Mining Applications
- Appendix A DIAL: A Dedicated Information Extraction Language for Text Mining
- Bibliography
- Index
Summary
Based on the outcome of the preprocessing stage, we can establish links between entities either by using co-occurrence information (within some lexical unit such as a document, paragraph, or sentence) or by using the semantic relationships between the entities as extracted by the information extraction module (such as family relations, employment relationship, mutual service in the army, etc.). This chapter describes the link analysis techniques that can be applied to results of the preprocessing stage (information extraction, term extraction, and text categorization).
A social network is a set of entities (e.g., people, companies, organizations, universities, countries) and a set of relationships between them (e.g., family relationships, various types of communication, business transactions, social interactions, hierarchy relationships, and shared memberships of people in organizations). Visualizing a social network as a graph enables the viewer to see patterns that were not evident before.
We begin with preliminaries from graph theory used throughout the chapter. We next describe the running example of the 9/11 hijacker's network followed by a brief description of graph layout algorithms. After the concepts of paths and cycles in graphs are presented, the chapter proceeds with a discussion of the notion of centrality and the various ways of computing it. Various algorithms for partitioning and clustering nodes inside the network are then presented followed by a brief description of finding specific patterns in networks. The chapter concludes with a presentation of three low-cost software packages for performing link analysis.
- Type
- Chapter
- Information
- The Text Mining HandbookAdvanced Approaches in Analyzing Unstructured Data, pp. 242 - 272Publisher: Cambridge University PressPrint publication year: 2006