Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts

Friday, December 13, 2013

Intro to Data Mining for Marketers - Part 2



Data Preprocessing

This stage is the most time-consuming stage of the data mining process. Data is never clean and in a form suitable for data mining. There are few typical data corruption problems in business databases such as duplication of the records, missing data fields, and presence of outliers. The preprocessing step involves integrating data from different sources and making choices about representing or coding certain data fields that serve as inputs to the data discovery stage. Such representation choices are needed because certain fields may contain data at level of details not considered suitable for the pattern discovery stage. For example, it may be counterproductive to represent the actual date birth of each customer to the data discovery stage. Instead, it may be better to group customers  into different age groups and  the chosen age groups should have some significations for the research goal. It is important to remember that the preprocessing stage is a crucial step. The representation choices made at this stage have a great bearing on the kinds of the patterns that will be discovered by the next stage of data discovery.

 Patterns & Market Segmentation

Since there are so many ways we, human beings, are different, it should not be surprising that we would differ in our needs for automobiles. While there are many factors/variables that contribute to these differences, we are considering the following factors for presenting our data mining framework for the aforementioned example: vehicle image (Table 1), customer anticipated feelings (Table 2),  and  demographics (such as age, sex, income, occupation, education etc). The demographic factor plays an important role in the proposed analysis. 

For example, consider how customer needs and preferences for an automobile change as one moves demographically from college student to management trainee; changes in income, occupation, and educational status each contribute to a changing set of customer needs for a variety of products such as an automobile. Many other variables can be incorporated as well.

There are mainly three different techniques to perform market segmentation:
•             Clustering: this approach implies data grouping or partitioning
•             Association: this approach seeks to establish associative relationships between different variables in the database
•             Visualization: this approach consists of providing the user with an immersive virtual reality environment so that the user can move through this environment discovering hidden relationships

Evaluation, Interpretation & Knowledge Discovery

To test how well the identified segments perform when predicting preferences for new customers, two approaches can be considered: train and test error estimation, and cross validation.


After the prediction accuracy is verified by one of the above methods, the segments will be evaluated by the business people in order to determine the usefulness of the segments. The evaluation of usefulness of the market segments should be made by the business team with respect to the following characteristics:

  • Substantiality(segment size): The market segments are large or profitable enough to serve.
  • Measurability: (segment profile): The market segments can be identified and measured in terms of data already available. The segment identification is very important: Segments that are based on meaningful differences in customer needs but lack clear segment identification will fail because the segment identity will not be known and an actionable marketing strategy cannot be developed.
  •  ActionabilityEffective programs can be designed for attracting and serving the segments. The market attractiveness depends on market opportunity, competitive environment and market access.
If a segment fits the company’s objectives, the company must decide whether it possesses the skills and resources needed to succeed in that segment. If the company lacks the strengths needed to compete successfully in a segment and cannot readily obtain them, it should not enter that segment. Even if the company possesses the required strengths, it needs to employ skills and resources superior to those of the competition in order to really win a market segment. Once the company has decided what segments to enter, it must decide on its market positioning strategy - on which positions to occupy in its chosen segments[D. Raicu, DePaul University).

Conclusion

A theoretical, qualitative data mining framework for automatic gathering of relevant and unbiased data was proposed. As a result, the initial investment of producing a new product vehicle without being certain that it will be satisfying people’s needs will be eliminated. Discovering a-priori segments of people being interested in a certain product will also help the managers focus their advertising, promotion, and sales efforts on those categories of people and thus, the time and costs will be significantly reduced.


Monday, December 02, 2013

Intro to Data Mining for Marketers - Part 1

Data mining can be defined as the process of "discovering patterns, meaning and insights in large datasets by using statistical and computational methods". Data mining works to analyze data stored in data warehouses that are used to store that data that is being analyzed. That particular data may come from all parts of business, from the production to the management. Managers also use data mining to decide upon marketing strategies for their product. They can use data to compare and contrast among competitors. Data mining interprets its data into real time analysis that can be used to increase sales, promote new product, or delete product that is not value-added to the company.

History

Data mining was born in the fields of Statistics and Computer Science (some might say Artificial Intelligence) and may also be referred as “Statistical Learning”. From a statistical perspective, most early and recent advances coming from Statistics have come from the Stanford Statistics department school of thoughts like  Bradley EfronJerome H. FriedmanTrevor Hastie and Robert Tibshirani. By the way, don’t forget that Stanford University is only 7 miles away from Google.

Stanford University ©

Data Mining Framework

Using data mining techniques, we, marketers, need to master an approach that will provide the decision makers with  a-priory knowledge about customers’ preferences and needs. Since there are many different kinds of customers with different kinds of needs and preferences, a simple, solid approach is meant to be a tool for performing market segmentation: divide the total market, choose the best segments, and design strategies for profitability serving the chosen segments better than the company’s competitors do. The example developed below is described for product development in auto industry, but it can be successfully implemented for any other applications where it is necessary to  find the correlations between the customer feelings or perceptions and the physical characteristics of a product. Yes, correlations, even through our statistics lenses. 

Yes,arithmophobia is over, my friend!


Understanding

Any data mining application should start by understanding the business goals of the application since the blind application of data mining techniques without  the requisite domain knowledge often leads to the discovery of irrelevant or meaningless patterns. In order to understand the target customers of an automotive company, it would be helpful to examine the relationships between the vehicle image/attributes and the customer emotional benefits that are tied to psychological needs, personality traits, and personal values. Thus, data mining can enable us to understand more completely how product specific characteristics relate to customer needs and the benefits a customer hopes to obtain from them. For instance, for many people, cars, homes, restaurants and vacations provide emotional benefits as well as rational benefits. However, for a wealthy person who has everything, the emotional benefits provided by status, prestige and superiority of an expensive automobile could outweigh rational benefits such as gas economy, lower maintenance and insurance costs, and resale value.  

A target audience perhaps? "Free to do anything, in control, confident, sporty but with family."
Therefore, it will be beneficial to have a tool that will help us to respond to questions such as: What and how many of the personality attributes used to describe the customer might be shaped by the vehicle’s image?  What kind of vehicle this customer or group of customers will buy?

Data selection

This step calls for targeting a database or selecting a subset of fields to be used for the data mining. The following issues should be considered in developing a plan for collecting data efficiently:

  • Evaluation of existing data sources 
  • Specification of research approaches 
  • Data gathering (contact methods, sampling plans and instruments)

The survey research is a simple, efficient method to collect data. One of the advantages of the survey research is flexibility because it can be used to obtain many different kinds of information in many different situations. Furthermore, depending on the survey design, it may also provide information quicker at a lower cost compared to manual processing. The survey may be in the form of a questionnaire that is very flexible as there are many ways to ask questions. In preparing the questionnaire, only the questions contributing to the research objectives will be asked. The questions may be closed-ended, as they include all possible answers. In designing the survey, we also make sure that the questions are simple, direct and arranged in a logical order.  The first question should create interest if possible, and difficult or personal questions should be asked last so that respondents do not become defensive.


Instead of a traditional mail questionnaire, a more modern approach is the computer interviewing process, in which respondents sit down at a computer, read questions from a screen, and type their own answers into the computer at their own leisure. The beauty of this approach consists of its multiple benefits. As a first benefit, the respondents’ answers are automatically stored in a database. Furthermore, the survey is posted on the web and it can be accessible by an unlimited number of people. Filling out the survey becomes a non-time consuming task even for a busy person: the survey is on the web and it is accessible for anybody at any time; the submission of the completed survey requires only a ‘click on’ action executed by respondent, action possible  through an interactive survey implementation. 

Third, the computers might be located at different locations such as auto shows, dealerships, or retail locations. The biggest benefit is the collection of more relevant data since people present at those locations are most likely willing to answer correctly to the questions because they are interested in automobiles. The approach can be implemented such that the data is gathered from numerous computers  at different locations and stored in a unique and global database. As a fourth benefit, same survey format will be accessible to different categories of people: expert people (such as car designers) or people less familiar with auto domain characteristics. The large number of respondents and their diversity give more reliability on the results than small samples.


                                                      ...To be continued...