HARPP: HARnessing the Power of Power sets for Mining Frequent Itemsets
Modern algorithms for mining frequent itemsets face noteworthy deterioration of performance when minimum support tends to decrease, especially for sparse datasets. Long-tailed itemsets, frequent itemsets found at lower minimum support, are significant for present-day applications such as recommender systems. In this study, we have developed a novel power set based method named as HARnessing the Power of Power sets (HARPP) for mining frequent itemsets. HARPP iteratively generates power sets to make combinations of overlapping varying-sized subsets of I, where I is a set of items in a large database. Intrinsic feature of creating power sets along with the use of set data structure ensures the agility of HARPP because most of its operations take constant running time. Without storing it entirely in memory, HARPP scans the dataset only once and mines frequent itemsets on the fly. In contrast to state-of-the-art, efficiency of HARPP increases with decrease in minimum support that makes it a viable technique for mining long-tailed itemsets. Performance study shows that HARPP is efficient and scalable, and is faster up to two orders of magnitude than FP-Growth algorithm at lower minimum support particularly when datasets are sparse.