Quant Essentials: Multivariate Analysis

In RW Connect’s new Quant Essentials series, we discuss critical methodological skills in simple, jargon-free language. The first article in the series, What Is Quantitative Research? gives some more background about the series. Our second article was about research design and our third, sampling. Next, we covered questionnaire design and data analysis.

This article is a snapshot of a huge topic – multivariate analysis. For those of you who’d like to study this subject in depth, I’ve listed some resources at the end of the article. Multivariate analysis (MVA) has a long tradition in many disciplines, and began diffusing into marketing research in the 1970s. Limitations of computing technology hindered its progress at first but it began to take root in the late 1980s and early 1990s. Since that time, the sheer amount of data and advances in computing have opened doors to all sorts of applications no one dreamed of when I began my career in marketing research.

The distinction between multivariate statistical methods and machine learning is vague, and these and other terms are often used synonymously. Making Sense of Machine Learning attempts to unravel some of this ambiguity. Each can be classified in various ways and one is to characterize them either as Interdependence methods or Dependence methods. A second point of differentiation pertains to whether a method is intended for Cross Sectional or Time Series data.

Factor Analysis and Cluster Analysis are probably the best-known Interdependence methods, though there are many others. Put very simply, Factor Analysis groups variables and Cluster Analysis groups observations, respondents in a consumer survey for example.

Dependence methods differ in that there is one or more Target (Dependent) variables we would like to explain or predict from one or more Predictor (Independent) variables. Many kinds of Dependence methods see extensive use in marketing research. They can be further subdivided according to whether the dependent variables are quantities, counts, ordered categories or nominal categories that have no natural order or rank. Regression and Discriminant Analysis in particular are well known in marketing research; the former is used when the dependent variable is quantitative (or we decide to treat it as such) and the latter comes into play when we wish to differentiate groups (e.g., User/Non-User).

Actually, it’s not quite this simple. Partial Least Squares Regression and some varieties of Structural Equation Modeling (SEM) are a blend of Independence and Dependence methods!

The techniques described thus far have been designed for cross-sectional data, data collected at one point in time. Time Series Analysis is used when the data have been collected over many time periods. Weekly sales data are an example of Time Series data. Exponential Smoothing, ARIMA, Dynamic Regression, State Space and GARCH models are just a few examples of Time Series Analysis methods. They are household words to Econometricians but more opaque to most of us in Marketing Research. Time Series Analysis plays important roles in Marketing Mix Modeling and ROI analysis as well as in sales forecasting.

There are also methods appropriate for Within-Subjects (Repeated Measures) and Longitudinal data. An example of when Within-Subjects designs are suitable is when consumers are asked to evaluate two or more products, real or hypothetical, as in a product use test or conjoint study. The venerable Repeated Measures MANOVA might be familiar to some of you. Longitudinal designs are useful when we observe consumers’ behavior over time. Survival Analysis is one such method and in Marketing Research is used in customer churn modeling.

Bayesian methods, which are not easy to describe in a nutshell, are seeing increasing use in marketing research. Put very simply, in Bayesian statistics we incorporate prior beliefs about the problem we’re studying directly into our analysis and then update our understanding of the problem we’re investigating when new data become available. From the outset, we are explicit about uncertainty. Bayesian methods have some important advantages in comparison with the more recognizable Frequentist methods. They are often more adept at handling sparse and messy data, for instance.

These are mostly “trad” methods. It would not be exaggerating to say there has been an explosion in the number and variety of analytic methods in recent years. Advances in computer technology have taken many methods off the drawing board and put them right into our laptops. The table of contents of Machine Learning: A Probabilistic Perspective (Murphy) offers a sample of the advanced analytics methods now available.

Whatever the analytic methods used, it is also now easier than ever to perform various kinds of “What if?” simulations to make educated guesses about what might happen under various marketing scenarios, such as the introduction of a new product or competitor activity. Done prudently, simulations can help our data and models speak to us and guide our decisions.

The foregoing is only a sample of the methods used in marketing research. An Analytics Toolbox offers some more detail about the ones I personally use most frequently. Though many haven’t yet diffused very far into the marketing research mainstream, it should be evident that we have no shortage of tools for analytics!

However, it is important that we not lose sight of our raison d’être – who will be using our deliverables, and how and when they will be used is most critical. We should focus first on the decisions, not the technology. It is also important that we do not confuse operating software with competent analytics. User-friendly software is designed so that anyone can use it, after all!

Some of the references listed in my company library may be of interest to you, especially those under Multivariate Analysis. One of the classic references on MVA isApplied Multivariate Statistical Analysis (Johnson and Wichern). Quantitative Applications in the Social Sciences Series (SAGE series) provides some gentle introductions to MVA, and SAGE research methods is a good resource for researchers working in the social and behavioral sciences, including marketing research. In addition, I have published short articles on LinkedIn that introduce Regression, SEM, Mapping, Segmentation, Key Driver Analysis and Conjoint Analysis.

We hope you’ve found this brief overview of multivariate analysis interesting and helpful!

Kevin Gray is President of Cannon Gray, a marketing science and analytics consultancy. He also co-hosts the audio podcast series MR Realities.

Arrange a Conversation 


Article by channel:

Read more articles tagged: Analytics, Featured, Statistics