Data Preprocessing
This stage is the most time-consuming stage of the data mining process. Data is never clean
and in a form suitable for data mining. There are few typical data corruption problems in business databases such as duplication of the records, missing data fields, and presence of outliers. The preprocessing step involves integrating data from different sources and making
choices about representing or coding certain data fields that serve as inputs
to the data discovery stage. Such representation choices are needed because
certain fields may contain data at level of details not considered suitable for
the pattern discovery stage. For
example, it may be counterproductive to represent the actual date birth of each
customer to the data discovery stage. Instead, it may be better to group
customers into different age groups
and the chosen age groups should have
some significations for the research goal. It is important to remember that the preprocessing stage is
a crucial step. The representation choices
made at this stage have a great bearing on the kinds of the patterns that will
be discovered by the next stage of data discovery.
Since there are so many ways we, human beings, are
different, it should not be surprising that we would differ in our needs for
automobiles. While there are many factors/variables that contribute to these
differences, we are considering the following factors for presenting our data
mining framework for the aforementioned example: vehicle image (Table 1), customer anticipated feelings (Table
2), and
demographics (such as age, sex, income, occupation, education etc). The
demographic factor plays an important role in the proposed analysis.
For example, consider how customer needs and preferences for
an automobile change as one moves demographically from college student to
management trainee; changes in income, occupation, and educational status each
contribute to a changing set of customer needs for a variety of products such
as an automobile. Many other variables can be incorporated as well.
There are mainly three different techniques to perform
market segmentation:
• Clustering:
this approach implies data grouping or partitioning
• Association:
this approach seeks to establish associative relationships between different
variables in the database
• Visualization:
this approach consists of providing the user with an immersive virtual reality
environment so that the user can move through this environment discovering
hidden relationships
Evaluation, Interpretation & Knowledge Discovery
To test how well the identified segments perform when
predicting preferences for new customers, two approaches can be considered:
train and test error estimation, and cross validation.
After the prediction accuracy is verified by one of the
above methods, the segments will be evaluated by the business people in order
to determine the usefulness of the segments. The evaluation of usefulness of
the market segments should be made by the business team with respect to the
following characteristics:
- Substantiality: (segment size): The market segments are large or profitable enough to serve.
- Measurability: (segment profile): The market segments can be identified and measured in terms of data already available. The segment identification is very important: Segments that are based on meaningful differences in customer needs but lack clear segment identification will fail because the segment identity will not be known and an actionable marketing strategy cannot be developed.
If a segment fits the company’s objectives, the company must
decide whether it possesses the skills and resources needed to succeed in that
segment. If the company lacks the strengths needed to compete successfully in a
segment and cannot readily obtain them, it should not enter that segment. Even
if the company possesses the required strengths, it needs to employ skills and
resources superior to those of the competition in order to really win a market
segment. Once the company has decided what segments to enter, it must decide on
its market positioning strategy - on which positions to occupy in its chosen
segments[D. Raicu, DePaul University).
Conclusion
A theoretical, qualitative data mining framework for automatic gathering of relevant and unbiased data was proposed. As a result, the initial investment of producing a new product vehicle without being certain that it will be satisfying people’s needs will be eliminated. Discovering a-priori segments of people being interested in a certain product will also help the managers focus their advertising, promotion, and sales efforts on those categories of people and thus, the time and costs will be significantly reduced.
Conclusion
A theoretical, qualitative data mining framework for automatic gathering of relevant and unbiased data was proposed. As a result, the initial investment of producing a new product vehicle without being certain that it will be satisfying people’s needs will be eliminated. Discovering a-priori segments of people being interested in a certain product will also help the managers focus their advertising, promotion, and sales efforts on those categories of people and thus, the time and costs will be significantly reduced.