Centrality measurement, such as degree centrality, betweenness, and eigenvector centrality, are among the most popular ones. Degree centrality is the simplest centrality measurement. Given a graph G, denote the set of vertices of G as V(G), and then the degree centrality for any v ∈ V(G) is defined as CD(v)=d(v)|V(G)|−1,(1) where d(v) is the degree of v and |V(G)| is the number of vertices in G. Degree centrality considers only the local topology of the network. It may be interpreted as a measure of immediate influence, rather than global impact in the network.

The betweenness centrality for any v ∈ V(G) is defined as CB(v)=2(|V(G)|−1)(|V(G)|−2)∑s≠v≠t σst(v)σst,(2) where s, v, t ∈ V(G), σst is the number of shortest paths from s to t, and σst(v) is the number of shortest paths from s to t that pass through the vertex v. Betweenness centrality is one of the most popular centrality measures which consider the global structure of the network. It characterizes how influential a vertex is in communicating between vertex pairs. The eigenvector centrality score of the ith vertex in the network is defined as the ith element of the eigenvector corresponding to the largest eigenvalue of the following characteristic equation: Ax=λx,(3) where A is the adjacency matrix of the network, λ is the largest eigenvalue of A, and x is the corresponding eigenvector.

It simulates a mechanism in which each vertex affects all of its neighbors simultaneously. Eigenvector centrality is a type of extended degree centrality that is proportional to the sum of the centralities of the vertex's neighbors. A vertex has large value of eigenvector centrality score either if it is connected to many other vertices or if it is connected to others that themselves have high eigenvector centrality. Since different centrality measures are based on different aspects of a network, the final centrality scores and ranking of the nodes in the network may be different. The difference may be discussed in Section 4.3. Centrality Guided Clustering In this section, some notation and terminology are introduced and the centrality guided clustering (CGC) algorithm is presented.

Given an input dataset, the dataset is modeled being a weighted graph G = (V, E, w). V is the vertex set. Every single vertex in V represents an component from the dataset. |V(G)| represents the number of vertices in G (or factors while in the dataset). E is definitely the edge set. Just about every edge represents a romance involving a pair of components.