Nearly everyone knows K-means algorithm in the fields of data mining and business intelligence. But the ever-emerging data with extremely complicated characteristics bring new challenges to this old algorithm. This book addresses these challenges and makes novel contributions in establishing theoretical frameworks for K-means distances and K-means based consensus clustering, identifying the dangerous uniform effect and zero-value dilemma of K-means, adapting right measures for cluster validity, and integrating K-means with SVMs for rare class analysis. This book not only enriches the clustering and optimization theories, but also provides good guidance for the practical use of K-means, especially for important tasks such as network intrusion detection and credit fraud prediction. The thesis on which this book is based has won the 2010 National Excellent Doctoral Dissertation Award , the highest honor for not more than 100 PhD theses per year in China.
The K-means algorithm is commonly used in data mining and business intelligence. This award-winning research pioneers its application to the intricacies of big data, detailing a theoretical framework for aggregating and validating clusters with K-means.
Cluster Analysis and K-means Clustering: An Introduction.- The Uniform Effect of K-means Clustering.- Generalizing Distance Functions for Fuzzy c-Means Clustering.- Information-Theoretic K-means for Text Clustering.- Selecting External Validation Measures for K-means Clustering.- K-means Based Local Decomposition for Rare Class Analysis.- K-means Based Consensus Clustering.
Nearly everyone knows K-means algorithm in the fields of data mining and business intelligence. But the ever-emerging data with extremely complicated characteristics bring new challenges to this old algorithm. This book addresses these challenges and makes novel contributions in establishing theoretical frameworks for K-means distances and K-meansl³³