Detecting Outliers in Data Streams Based on Minimum Rare Pattern Mining and Pattern Matching
Keywords:outlier detection, minimal rare pattern mining, pattern matching, data streams
Outliers are the major factors to influence the accuracy of data-based processing, thus, they must be discovered from collected datasets to guarantee data security. With the widely use of sensors and other monitoring equipment, data streams are becoming the main form of data. However, the huge scale of data streams results in the number of mined rare patterns very large, which makes it hard to effectively detect outliers through pattern-based outlier detection methods. Since minimal rare patterns (MRPs) can represent rare patterns and the number is much smaller, therefore, the use of MRPs on outlier detection can short the time consumption. Based on this idea, this paper proposes an outlier detection approach based on minimal rare pattern, called ODMRP, which is composed of pattern mining phase and pattern matching phase. Specifically, in the pattern mining phase, an improved minimal rare pattern mining algorithm, namely MRPM, is proposed to mine the MRPs from data streams; It first constructs two matrix structures to store information of transactions and frequent 2-patterns, and then apply “pattern extension” operations to extend frequent 2-patterns to longer patterns, at the same time, rare patterns are removed to prevent them participating into “pattern extension” operation to reduce meaningless time cost; In the pattern matching phase, we use the IM-Sunday algorithm to match the mined MRPs with the patterns stored in outlier pattern library, to find potential outliers. Extensive experimental studies show that the proposed ODMRP method can accurately detect outliers from data streams in less overhead.
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.