A Comparison of Mining Incomplete and Inconsistent Data
DOI:
https://doi.org/10.5755/j01.itc.46.2.17330Keywords:
Incomplete data, lost values, \do not care" conditions, in- consistent data, rough set theory, probabilistic approximations, MLEM2 rule induction algorithm.Abstract
We present experimental results on a comparison of incom-pleteness and inconsistency. We used two interpretations of missing at-tribute values: lost values and "do not care" conditions. Our experimentswere conducted on 204 data sets, including 71 data sets with lost val-ues, 71 data sets with "do not care" conditions and 62 inconsistent datasets, created from eight original numerical data sets. We used the Modified Learning from Examples Module version 2 (MLEM2) rule inductionalgorithm for data mining, combined with three types of probabilisticapproximations: lower, middle and upper. We used an error rate, com-puted by ten-fold cross validation, as the criterion of quality. There isexperimental evidence that incompleteness is worse than inconsistencyfor data mining (two-tailed test, 5% level of signicance). Additionally,lost values are better than "do not care" conditions, again, with regardsto the error rate, and there is a little dierence in an error rate betweenthree types of probabilistic approximations.
Downloads
Published
Issue
Section
License
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.