9. How to Analyze the Data
(Textbook page 335)
Once tabulated, the data must be analyzed, the findings interpreted and insights drawn and effectively communicated. This unit provides a basic introduction to different statistical and marketing analytics techniques used in marketing research.
Numeracy and Developing a Story (P.336)
It is possible to understand and communicate what numbers mean – numeracy – with only a rudimentary knowledge of statistics. Marketing research data, presented in computer tables, can be overwhelming, making it difficult to see underlying patterns. Here are key starting points:
- Develop focal points based on the study’s objectives (e.g. product ratings) and scan through the key questions for patterns.
- Look at differences between sub-groups that are relevant to your market (e.g. regions, language) and category (e.g. user groups).
- Identify important analysis variables and determine which are dependent (react to the interaction) and which are independent (cause or impact the interaction).
Three factors to consider in developing a story are:
- prior knowledge (how current numbers relate to what we’ve seen before),
- background information (what else is happening in the market) and
- statistical analysis.
Measured vs. Counted Variables (P.342)
The choice of statistical measures depends whether we use counted variables (including nominal variables with no inherent order, or ordinal with inherent order) or measured or metric variables (including ratio variables with definitive zero point and meaningful intervals, or interval variables with neither of these).
To understand and interpret a large data set, we often reduce it to a few summary numbers, including measures of central tendency:
- The Arithmetic Mean (average) is most frequently used for metric variables, the most sensitive to changes in the data and considered the most useful overall
- The Median is the middle observation when a small data set is arranged in order of magnitude, so that there are an equal number of observations above and below
- The Mode is the most frequently occurring or typical value in the data set
How to understand Variability (P.349)
The most common measures of dispersion or variation are:
- Range – the difference between the largest and the smallest number. It can be very sensitive to one single large or small number and is not very useful if scales are used because each point on the scale is likely to be selected.
- Mean Absolute Deviation – subtract the mean from each score to generate deviations, add all deviations ignoring negative signs and divide by the sample size.
- Variance – Same as above, but square each deviation before adding them together, and divide by (sample size minus 1), giving more weight to larger deviations.
- Standard Deviation – Same as above, but taking the square root of the Variance to bring it back to the same scale as the mean. Together SD and mean summarize all of the important information contained in the data.
- The Coefficient of Variation expresses the standard deviation as a percentage of the mean to get a single summary of dispersion.
- Standard or “Z” Scores use the Standard Deviation as a common measure to compare data with different contexts (e.g. different monetary unit, or cost of living differences)
Plotting the distribution of a variable generally follows a Normal Curve, in which the mean marks the center, most observations fall close to it, and only a small proportion have extreme values.
Calculating Sampling Errors and Statistical Significance (P.357)
- The Margin of Error is the range within which the ‘true result’ (if we had interviewed everyone in the population) is likely to fall from the observed result. It is related to the overall sample size, assuming a normal probability distribution.
- Standard Error is the standard deviation of the means of large samples of a specific size from the same population, telling us how many sample means are within a given range.
- Confidence Levels relate to normal distributions, stating the standard error range 90 or 95 or 99 percent of the time.
- The Standard Error of Difference is used to assess whether the obtained differences in means between two groups are significant.
- ‘Hypothesis testing’ uses statistical significance tests to disprove the ‘null hypothesis’, that there is no difference between two groups.
When there are more than two groups, we use overall tests of significance (P.365):
- Chi-squared Tests (for non-metric data)
- Analysis of Variance (for metric data)
Measures of Association (P.373)
- Correlation coefficient, a number that summarizes the relationship between two variables (ranging from -1.0 to +1.0). A high correlation means a strong relationship between two variables but offers no causal implications.
- We use regression analysis to understand what weights attached to independent variables will predict the value of the dependent variable. We use it to understand which variables impact customer satisfaction, or what key drivers influence how likely a consumer is to buy our brand, or to model the impact of adjusting a variable like price.
- This textbook unit also discusses collinearity, influential observations, warnings about extrapolating outside the measured range, non-linear relationships, the risk of a ‘too small’ sample and the fact that the researcher specifies the model.
Modelling and Analytic Techniques (P. 379)
This section provides non-technical descriptions of various approaches to predictive analytics. Prediction is the central purpose of marketing analytics, but models can and should be used to explain the reasons for making specific marketing mix decisions.
- Correlation Analysis measures the degree to which items are related. It is useful to understand what perceptions or attitudes are likely driving behaviour, but it does not show the direction of impact, only the extent to which they move in the same or different directions.
- Factor Analysis and Principal Component Analysis are data reduction techniques that use response patterns to group together items to identify common underlying themes.
- Cluster Analysis (Segmentation) divides the market population into groups (segments) that are relatively homogeneous within themselves (sharing behaviour, attitudes, demographics) but distinctly different from each other. This helps the marketer understand if marketing to one or more segments will produce better returns than mass marketing.
- A Decision Tree reveals in what order consumers think through product characteristics when choosing between offerings. This helps retailers understand how to organize products on the shelf and tells manufacturers what to emphasize in broader advertising versus in-store.
- Discrete Choice Experiments (Conjoint analysis) shows respondents several variations in the design of a product in which different levels of each attribute are switched around. They choose the one option they like best, or decide not to buy any. The output is the relative importance of each attribute in driving the purchase decision and the relative preference for each level of an attribute. This is important in designing or re-designing products, determining pricing, measuring brand equity, etc. It allows marketers to test the market prior to committing to the final product design.