Data Evolvement Analysis Based on Topology Self-Adaptive Clustering algorithm

Authors

  • Ming Liu Harbin Institute of Technology
  • Quan Bing Liu Harbin Institute of Technology
  • Chao Yuan Liu Harbin Institute of Technology
  • Jie Cheng Sun Harbin Institute of Technology

DOI:

https://doi.org/10.5755/j01.itc.41.2.974

Keywords:

Topology adaptation, competitive learning, data evolvement analysis

Abstract

Along with the fast advance of internet technique, internet users have to deal with tremendous data every day. To our common sense, one of the most useful knowledge provided for users is about the transfer of the information reflected by two data sets collected at different time stages. This task aims at exploiting the knowledge such as what information newly appears, what information is antiquated, and what information maintains unchanged. It is formally entitled as data evolvement analysis. Clustering is a good solution to this issue. By analyzing the clustering results formed at different time stages, it is simple to acquire the transfer of the information. Unfortunately, aforementioned plan is impractical, since it needs to perform clustering algorithm once more, every time input data are updated. Obviously, it is time-consuming. Therefore, we need to devise a dynamic clustering algorithm. It automatically adjusts its structure to express this transfer. For this reason, a novel Topology Self-Adaptive Clustering algorithm (abbreviated as TSAC) is proposed in this paper. This algorithm comes from Self Organizing Mapping algorithm (abbreviated as SOM), whereas, it doesn't need to make any assumption about neuron topology beforehand. Besides, when input data are updated, its topology remodeled meanwhile. For further elevating its performance, it imports minimum spanning tree to preserve its topology order, which is never performed by any traditional SOM based topology adaptive algorithm. For clearly measuring the range of the transfer, it partitions data space into several grids, and then calculates the density of each grid to quantify the transfer. Experiment results demonstrate that TSAC can automatically tune its topology along with the change of input data. By this algorithm and in addition to grid structure, the transfer of the information can be legibly visualized.

DOI: http://dx.doi.org/10.5755/j01.itc.41.2.974

Author Biography

Ming Liu, Harbin Institute of Technology

Department of computer science; interest: data minning, clustering;

Downloads

Published

2012-04-26

Issue

Section

Articles