An Efficient Outlier Detection Approach Over Uncertain Data Stream Based on Frequent Itemset Mining

Shangbo Hao, Saihua Cai, Ruizhi Sun, Sicong Li

Abstract


Outlier detection is essential in data-based science, it aims at detecting the itemsets that with a significant difference to other data. With the limiting of equipment precision and network transmission, the uncertain data is more common in daily life. However, the traditional outlier detection methods are not applicable for uncertain data stream and the large volume of data makes the outlier detecting occupy large memory usage and time cost, moreover, the multiple scanning times on data stream for Apriori-like methods are unrealistic. In this paper, the matrix structure is constructed to store the information of uncertain data stream and the following mining process is conducted with matrix structure, therefore, the whole data stream only need to be scanned for only one times. Then, the “upper cap” concept is used in FIM-UDS method to mine the frequent itemsets more effectively to support outlier detecting. Moreover, two outlier factors and outlier detection method that called FIM-UDSOD are designed to detect the potential outliers. Finally, two public datasets are used to verify the efficiency of FIM-UDS method and one synthetic dataset is used for evaluating FIM-UDSOD method, the experimental results show that our proposed FIM-UDSOD method is more effective in outlier detecting.

Keywords


outlier detecting; frequent itemset mining; uncertain data stream; outlier factors

Full Text: PDF

Print ISSN: 1392-124X 
Online ISSN: 2335-884X