## Abstract

Spectral clustering is a key research topic in the field of machine learning and data mining. Most of the existing spectral clustering algorithms are built on gaussian Laplacian matrices, which is sensitive to parameters. We propose a novel parameter-free distance-consistent locally linear embedding. The proposed distance-consistent LLE can promise that edges between closer data points are heavier. We also propose a novel improved spectral clustering via embedded label propagation. Our algorithm is built on two advancements of the state of the art. First is label propagation, which propagates a node's labels to neighboring nodes according to their proximity. We perform standard spectral clustering on original data and assign each cluster with -nearest data points and then we propagate labels through dense unlabeled data regions. Second is manifold learning, which has been widely used for its capacity to leverage the manifold structure of data points. Extensive experiments on various data sets validate the superiority of the proposed algorithm compared to state-of-the-art spectral algorithms.

## 1 Introduction

Data clustering is a fundamental research topic and is widely used for many applications in the fields of artificial intelligence, statistics, and social sciences (Jain, Murty, & Flynn, 1999; Jain & Dubes, 1988; Girolami, 2002; Ye, Zhao, & Liu, 2007). The objective of clustering is to partition the original data points into various groups so that data points within the same cluster are dense while those in different clusters are far from each other (Jain & Dubes, 1988; Filippone, Camastra, Masulli, & Rovetta, 2008).

Among various implementations of clustering, k-means is one of the most popular choices because of its simplicity and effectiveness (Wu, Hoi, Jin, Zhu, & Yu, 2012). The general procedure of traditional k-means (TKM) is to randomly initialize clustering centers, assign each data point to its nearest cluster, and compute a new clustering center. Researchers claim that the curse of dimensionality may deteriorate the performance of TKM (Ding & Li, 2007). A straightforward solution to this problem is to project original data sets to a low-dimensional subspace by dimensionality reduction, PCA, before performing TKM. Discriminative analysis has been shown to be effective in enhancing clustering performance (Ding & Li, 2007; La Torre, Fernando, & Kanade, 2006; Ye, Zhao, & Liu, 2007). Motivated by this fact, we propose discriminative k-means (DKM) (Ye, Zhao, & Wu, 2007) to incorporate discriminative analysis and clustering into a single framework to formalize the clustering as a trace maximization problem. However, TKM and DKM fail to take low-dimensional manifold structure of data into consideration.

Spectral clustering (SC) (Yu & Shi, 2003; Filippone et al., 2008; Shi & Malik, 2000) has gradually attracted more and more research attention for its capacity to mine intrinsic data geometric structures, which facilitates partitioning data with more complicated structures (Belkin & Niyogi, 2003; Yang, Shen, Nie, Ji, & Zhou, 2011; Nie, Xu, Tsang, & Zhang, 2009; Wu & Schölkopf, 2006; Yang, Xu, Nie, Yan, & Zhuang, 2010). The basic idea of SC is to find a cluster assignment of the data points by adopting a spectrum of similarity matrix that leverages the nonlinear and low-dimensional manifold structure of the original data. Inspired by the benefits of spectral clustering, researchers have proposed different variants of the SC method to demonstrate its effectiveness. For example, local learning-based clustering (LLC) (Wu & Schölkopf, 2006) uses a kernel regression model for label prediction based on the assumption that the class label of a data point can be determined by its neighbors. Self-tuning SC (Zelnik-Manor & Perona, 2004) is able to tune parameters automatically in a unsupervised scenario. Normalized cuts are capable of balancing the volume of clusters for the use of data density information (Shi & Malik, 2000).

Label propagation has shown its capability for propagating labels through the data set along high-density areas defined by unlabeled data (Zhu & Ghahramani, 2002; Wang & Zhang, 2008). The key to the label propagation is cluster assumption (Chapelle, Weston, & Schölkopf, 2002): nearby data points are likely to belong to the same cluster, and data points on the same structure are likely to have the same label. Motivated by the benefits gained by label propagation, we intend to introduce label propagation into the field of spectral clustering (Kang, Jin, & Sukthankar, 2006; Cao, Luo, & Huang, 2008; Cheng, Liu, & Yang, 2009).

Our proposed spectral clustering algorithm combines the strengths of spectral clustering and label propagation. The main process of our algorithm is shown in Figure 1. We first perform standard spectral clustering on an original data set and obtain clusters. Then we pick out data points that are close to each cluster center respectively and form a label matrix . By using of manifold learning, we propagate labels through dense unlabeled data regions. We call the proposed method improved spectral clustering via embedded label propagation (SCLP).

The main contributions of this letter can be summarized as follows:

To the best of our knowledge, this is the first time that spectral clustering and embedded label propagation have been incorporated into a single framework. We propagate the labels obtained by spectral clustering to other unlabeled data points.

We integrate the advantage of manifold learning, which is capable of leveraging manifold structure among data points, into the proposed framework.

We propose novel distance-consistent locally linear embedding. The proposed graph is different from a traditional gaussian graph approach, parameter free.

Extensive experiments on seven real-world data sets demonstrate that the proposed SCLP outperforms state-of-the-art clustering algorithms.

## 2 Related Work

### 2.1 Locally Linear Embedding

Locally linear embedding (LLE) (Roweis & Saul, 2000) aims to identify low-dimensional global coordinates that lie on or very near a manifold embedded in a high-dimensional space. The purpose is to combine the data points with minimal discrepancy after completing a different linear dimensionality reduction at each point.

LLE has three steps: build a neighborhood for each data point, find the weights in order to linearly approximate the data in that neighborhood, and find the low-dimensional coordinates best reconstructed by those weights.

By way of example, given a data set matrix , the main steps of LLE are as follows:

For each data point , find its nearest neighbors.

### 2.2 Spectral Clustering

## 3 The Proposed Framework

In this section, we illustrate the detailed framework of our algorithm. We aim to cluster the data set into clusters. Suppose indicates the data set; is the dimension of data points, and is the total number of data points.

### 3.1 Distance-Consistent Similarity Learning

Following the work in Karasuyama and Mamitsuka (2013), we propose leveraging manifold regularization built on the Laplacian graph for label propagation. To begin, we first present a novel distance-consistent LLE.

From the above function, we can observe that the proposed distance-consistent LLE suggests that the edge between closer nodes has a greater weight.

### 3.2 Refined Spectral Clustering

### 3.3 Optimization

## 4 Experiments

In this section, we conduct extensive experiments to validate the performance of the proposed SCLP and compare it to related state-of-the-art spectral clustering algorithms, following a study of parameter sensitivity.

### 4.1 Data Set Description

We use seven trademark data sets to validate the performance of the proposed algorithm (see Table 1). The USPS data set has 9298 gray-scale handwritten digital images with an image size of 256 scanned from envelopes from the U.S. Postal Service. The Yale-B data set (Georghiades, Belhumeur, & Kriegman, 2001) consists of 2414 near-frontal images from 38 persons under different illuminations. The AR data set (Martinez & Benavente, 1998) has 840 images with a dimension of 768. The FRGC data set (Phillips et al., 2005), collected at the University of Notre Dame, contains 50,000 images taken across 13 different poses, under 43 different illumination conditions, and with 4 different expressions per person. The MSRA50 data set (He, Yan, Hu, Niyogi, & Zhang, 2004) consists of 1799 images and 12 classes. The PALM data set consists of 700 right-hand images, 7 samples per person across 100 users, taken with a digital camera; the images are resized to the same dimension of . The human lung carcinomas (LUNG) data set (Singh et al., 2002) contains 203 samples and 3312 genes. Following previous work, we use pixel values as feature representations of these images.

Data Set . | Matrix Size . | Data Set Size . | Class Number . |
---|---|---|---|

LUNG | 3312 | 203 | 4 |

PALM | 256 | 2000 | 100 |

MSRA50 | 1024 | 1799 | 12 |

FRGC | 1296 | 5658 | 275 |

AR | 768 | 840 | 120 |

Yale-B | 1024 | 2414 | 38 |

USPS | 256 | 9298 | 10 |

Data Set . | Matrix Size . | Data Set Size . | Class Number . |
---|---|---|---|

LUNG | 3312 | 203 | 4 |

PALM | 256 | 2000 | 100 |

MSRA50 | 1024 | 1799 | 12 |

FRGC | 1296 | 5658 | 275 |

AR | 768 | 840 | 120 |

Yale-B | 1024 | 2414 | 38 |

USPS | 256 | 9298 | 10 |

### 4.2 Experiment Setup

We compare the proposed SCLP with traditional k-means (TKM) (Wu et al., 2012), discriminative k-means (DKM) (Ye, Zhao, & Wu, 2007), local learning clustering (LLC) (Wu & Schölkopf, 2006), nonnegative normalized cut (NNC) (Shi & Malik, 2000), spectral clustering (SC), CLGR (Wang, Zhang, & Li, 2009), and spectral embedding clustering (SEC) (Nie et al., 2009).

The size of neighborhood is set to 5 for all spectral clustering algorithms. For the parameter, , in NNC, we perform a self-tuning algorithm (Zelnik-Manor & Perona, 2004) to determine the best parameter. For parameters in DKM, LLC, CLGR, and SEC, we tune them in the range of and report the best results. Note that the results of all clustering algorithms vary based on initialization. To reduce the influence of statistical variation, we repeat each clustering 50 times with random initialization and report the results according to the best objective function values. For SCLP, we select two data points per cluster nearest to the clustering center.

### 4.3 Evaluation Metrics

Following related clustering studies, we use clustering accuracy (ACC) and normalized mutual information (NMI) as evaluation metrics for our experiments.

### 4.4 Experimental Results

We show the clustering results of different algorithms in terms of ACC and NMI over seven benchmark data sets in Tables 2 and 3. Based on the results of our experiment, we can make the following observations:

When comparing the k-means-based algorithms (i.e., TKM and DKM), DKM generally outperforms TKM because discriminative dimension reduction is integrated into a single framework. Thus, each cluster is more identifiable and facilitates clustering performance. We can therefore safely conclude that discriminative information is beneficial for clustering.

SC outperforms LLC on the Yale-B and USPS data sets, while LLC outperforms SC on all those remaining. That is, CLGR achieves better performance on all data sets than both algorithms combined.

SEC obtains the second-best performance over the seven data sets, which indicates that linearity regularization can also facilitate clustering performance. Similar to our algorithm, SEC is capable of dealing with out-of-sample data.

The proposed algorithm SCLP generally outperforms the compared clustering algorithms on the seven benchmark data sets, which demonstrates that manifold regularization-based label propagation is beneficial for spectral clustering.

. | LUNG . | PALM . | MSRA50 . | FRGC . | AR . | YaleB . | USPS . |
---|---|---|---|---|---|---|---|

KM | |||||||

DKM | |||||||

NNC | |||||||

SC | |||||||

LLC | |||||||

CLGR | |||||||

SEC | |||||||

SCLP |

. | LUNG . | PALM . | MSRA50 . | FRGC . | AR . | YaleB . | USPS . |
---|---|---|---|---|---|---|---|

KM | |||||||

DKM | |||||||

NNC | |||||||

SC | |||||||

LLC | |||||||

CLGR | |||||||

SEC | |||||||

SCLP |

Note: The proposed algorithm, SCLP, generally outperforms the compared algorithms, which indicates that manifold regularization-based label propagation is beneficial for spectral clustering.

. | LUNG . | PALM . | MSRA50 . | FRGC . | AR . | YaleB . | USPS . |
---|---|---|---|---|---|---|---|

KM | |||||||

DKM | |||||||

NNC | |||||||

SC | |||||||

LLC | |||||||

CLGR | |||||||

SEC | |||||||

SCLP |

. | LUNG . | PALM . | MSRA50 . | FRGC . | AR . | YaleB . | USPS . |
---|---|---|---|---|---|---|---|

KM | |||||||

DKM | |||||||

NNC | |||||||

SC | |||||||

LLC | |||||||

CLGR | |||||||

SEC | |||||||

SCLP |

Note: The proposed algorithm, SCLP, generally outperforms the compared algorithms, which indicates that manifold regularization-based label propagation is beneficial for spectral clustering.

### 4.5 Parameter Sensitivity

In this section, we study performance variance with regard to the regularization parameters and on all the data sets used. The performance is reported in Figure 2, which shows how clustering performance varies on different combinations of and . We can see that better performance occurs when and are comparable.

## 5 Conclusion

In this letter, we have proposed a novel improved spectral clustering algorithm (SCLP). Most of the existing spectral clustering algorithms are based on gaussian matrices or LLE, both of which are extremely sensitive to parameters. Moreover, the parameters are difficult to tune. We have presented a novel distance-consistent LLE that is parameter free. This LLE can promise that the edge between closer data points has a greater weight. Utilizing this distance-consistent LLE, we have proposed an improved means of spectral clustering using label propagation. The proposed algorithm takes advantage of label propagation and manifold learning. With label propagation, we can propagate the labels obtained through spectral clustering to other unlabeled data points. By adopting manifold learning, we leverage the manifold structure among data points. Note that our framework can also be readily applied to out-of-sample data. Finally, we have evaluated the clustering performance of the proposed algorithm over seven data sets. The experimental results demonstrate that the proposed algorithm consistently outperforms other algorithms compared to it.

## Acknowledgments

The research is supported by Science Foundation of the China (Xi'an) Institute for Silk Road Research (2016SY10).