
Data Preprocessing
This stage is the most time-consuming stage of the data mining process. Data is never clean
and in a form suitable for data mining. There are few typical data corruption problems in business databases such as duplication of the records, missing data fields, and presence of outliers. The preprocessing step involves integrating data from different sources and making
choices about representing or coding certain data fields that serve as inputs
to the data discovery stage. Such representation choices are needed because
certain fields may contain data at level of details not considered suitable...