Friday, December 13, 2013

Intro to Data Mining for Marketers - Part 2

Data Preprocessing

This stage is the most time-consuming stage of the data mining process. Data is never clean and in a form suitable for data mining. There are few typical data corruption problems in business databases such as duplication of the records, missing data fields, and presence of outliers. The preprocessing step involves integrating data from different sources and making choices about representing or coding certain data fields that serve as inputs to the data discovery stage. Such representation choices are needed because certain fields may contain data at level of details not considered suitable for the pattern discovery stage. For example, it may be counterproductive to represent the actual date birth of each customer to the data discovery stage. Instead, it may be better to group customers  into different age groups and  the chosen age groups should have some significations for the research goal. It is important to remember that the preprocessing stage is a crucial step. The representation choices made at this stage have a great bearing on the kinds of the patterns that will be discovered by the next stage of data discovery.

 Patterns & Market Segmentation

Since there are so many ways we, human beings, are different, it should not be surprising that we would differ in our needs for automobiles. While there are many factors/variables that contribute to these differences, we are considering the following factors for presenting our data mining framework for the aforementioned example: vehicle image (Table 1), customer anticipated feelings (Table 2),  and  demographics (such as age, sex, income, occupation, education etc). The demographic factor plays an important role in the proposed analysis. 

For example, consider how customer needs and preferences for an automobile change as one moves demographically from college student to management trainee; changes in income, occupation, and educational status each contribute to a changing set of customer needs for a variety of products such as an automobile. Many other variables can be incorporated as well.

There are mainly three different techniques to perform market segmentation:
•             Clustering: this approach implies data grouping or partitioning
•             Association: this approach seeks to establish associative relationships between different variables in the database
•             Visualization: this approach consists of providing the user with an immersive virtual reality environment so that the user can move through this environment discovering hidden relationships

Evaluation, Interpretation & Knowledge Discovery

To test how well the identified segments perform when predicting preferences for new customers, two approaches can be considered: train and test error estimation, and cross validation.

After the prediction accuracy is verified by one of the above methods, the segments will be evaluated by the business people in order to determine the usefulness of the segments. The evaluation of usefulness of the market segments should be made by the business team with respect to the following characteristics:

  • Substantiality(segment size): The market segments are large or profitable enough to serve.
  • Measurability: (segment profile): The market segments can be identified and measured in terms of data already available. The segment identification is very important: Segments that are based on meaningful differences in customer needs but lack clear segment identification will fail because the segment identity will not be known and an actionable marketing strategy cannot be developed.
  •  ActionabilityEffective programs can be designed for attracting and serving the segments. The market attractiveness depends on market opportunity, competitive environment and market access.
If a segment fits the company’s objectives, the company must decide whether it possesses the skills and resources needed to succeed in that segment. If the company lacks the strengths needed to compete successfully in a segment and cannot readily obtain them, it should not enter that segment. Even if the company possesses the required strengths, it needs to employ skills and resources superior to those of the competition in order to really win a market segment. Once the company has decided what segments to enter, it must decide on its market positioning strategy - on which positions to occupy in its chosen segments[D. Raicu, DePaul University).


A theoretical, qualitative data mining framework for automatic gathering of relevant and unbiased data was proposed. As a result, the initial investment of producing a new product vehicle without being certain that it will be satisfying people’s needs will be eliminated. Discovering a-priori segments of people being interested in a certain product will also help the managers focus their advertising, promotion, and sales efforts on those categories of people and thus, the time and costs will be significantly reduced.