Semantics Based Clustering through Cover-Kmeans with OntoVsm for Information Retrieval
Document clustering plays a significant task in the retrieval of the information, which seeks to divide documents into groups automatically, depending on their content similarity. The cluster consists of related documents within the group (having high intra-cluster similarity) and dissimilar to other group documents (having low inter-cluster similarity). Clustering documents should be considered an unsupervised process that aims to classify documents by identifying underlying structures, i.e. the learning process is unsupervised. So there is no need to determine the correct output for an input. Previous clustering methods do not know the semantic associations between words such that the context of documents cannot be correctly interpreted. In order to address this problem, the advent of semantic ontology information such as WordNet was widely used to enhance text clustering consistency. This paper initially proposes an OntoVSM model to reduce the dimension of the document efficiently. The cover K-means clustering algorithm is proposed for semantic document clustering. The proposed algorithm is a hybrid version of K-Means and covers coefficient-based clustering methodology (C3M) that is improved semantically using WordNet ontology. The dimensionality reduction based on semantic knowledge of each term preserves the information without loss. The performance of the proposed work is analysed through experimental results. This shows that the proposed work gives improved results compared to other standard methods.
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.