Text Document Clustering Approach by Improved Sine Cosine Algorithm

Authors

  • Branislav Radomirović Singidunum University Danijelova 32, 11000 Belgrade, Serbia
  • Vuk Jovanović School of Electrical Engineering, University of Belgrade Bul. kralja Aleksandra 73, 11000 Belgrade, Serbia
  • Bosko Nikolić School of Electrical Engineering, University of Belgrade Bul. kralja Aleksandra 73, 11000 Belgrade, Serbia
  • Sasa Stojanović School of Electrical Engineering, University of Belgrade Bul. kralja Aleksandra 73, 11000 Belgrade, Serbia
  • K. Venkatachalam University of Hradec Kralove 50003 Hradec Kralove, Czech Republic
  • Miodrag Zivkovic Singidunum University Danijelova 32, 11000 Belgrade, Serbia
  • Angelina Njeguš Singidunum University Danijelova 32, 11000 Belgrade, Serbia
  • Nebojsa Bacanin Singidunum University Danijelova 32, 11000 Belgrade, Serbia
  • Ivana Strumberger Singidunum University Danijelova 32, 11000 Belgrade, Serbia

DOI:

https://doi.org/10.5755/j01.itc.52.2.33536

Keywords:

text document clustering, optimization problems, metaheuristics, sine cosine algorithm, hybridization and K-means

Abstract

Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.

Author Biographies

Branislav Radomirović, Singidunum University Danijelova 32, 11000 Belgrade, Serbia

 

 

Vuk Jovanović, School of Electrical Engineering, University of Belgrade Bul. kralja Aleksandra 73, 11000 Belgrade, Serbia

 

 

 

Bosko Nikolić, School of Electrical Engineering, University of Belgrade Bul. kralja Aleksandra 73, 11000 Belgrade, Serbia

 

 

Sasa Stojanović, School of Electrical Engineering, University of Belgrade Bul. kralja Aleksandra 73, 11000 Belgrade, Serbia

 

 

K. Venkatachalam, University of Hradec Kralove 50003 Hradec Kralove, Czech Republic

 

 

Miodrag Zivkovic, Singidunum University Danijelova 32, 11000 Belgrade, Serbia

 

 

 

Angelina Njeguš, Singidunum University Danijelova 32, 11000 Belgrade, Serbia

 

 

Nebojsa Bacanin, Singidunum University Danijelova 32, 11000 Belgrade, Serbia

 

 

Ivana Strumberger, Singidunum University Danijelova 32, 11000 Belgrade, Serbia

 

 

Downloads

Published

2023-07-15

Issue

Section

Articles