Big Data Full-Text Search Index Minimization Using Text Summarization

Waheed Iqbal; Waqas Ilyas Malik; Faisal Bukhari; Khaled Mohamad  Almustafa; Zubiar Nawaz

doi:10.5755/j01.itc.50.2.25470

Authors

Waheed Iqbal Punjab University College of Information Technology (PUCIT), University of the Punjab, Lahore, Pakistan
Waqas Ilyas Malik Punjab University College of Information Technology (PUCIT), University of the Punjab, Lahore, Pakistan
Faisal Bukhari Punjab University College of Information Technology (PUCIT), University of the Punjab, Lahore, Pakistan
Khaled Mohamad Almustafa College of Computer and Information Sciences, Prince Sultan University Riyadh, Saudi Arabia
Zubiar Nawaz Punjab University College of Information Technology (PUCIT), University of the Punjab, Lahore, Pakistan

DOI:

https://doi.org/10.5755/j01.itc.50.2.25470

Keywords:

Big Data, Indexing, Searching, Index Minimization, Text Summarization

Abstract

An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storage
cost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertain
full-text search queries with good performance. It also incurs overhead to store, manage, and update the large
size index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text search
over Big Data using an automatic extractive-based text summarization method. To evaluate the effectiveness
of the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets using
Apache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average ranking
score measures of search results obtained using different search queries. Our experimental evaluation shows
that automatic text summarization is an effective method to reduce the index size significantly. We obtained a
maximum of 82% reduction in index size with 42% higher relevance of the search results using the proposed
solution to minimize the full-text index size.

Big Data Full-Text Search Index Minimization Using Text Summarization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

crossref2

crossref

Information