var1 Random Thoughts on Marketing Research, Statistics and Data Science (July 16, 2018)

Random Thoughts on Marketing Research, Statistics and Data Science (July 16, 2018)

My opinions. Nothing more. Statisticians make use of dozens of probability distributions, and the normal distribution is but one. Some methods are entirely data-driven and truly non-parametric.

Distributional assumptions are sometimes criticised as weaknesses of statistics or as burdensome. In reality, they allow us to use both small and big data more efficiently.

Another misconception is that statistics only “works” with small data. Anyone making such a suggestion immediately discredits themselves, one simple reason being that familiar statistical methods such as linear and logistic regression, K-means and principle components (“factor”) analysis are workhorses in data science.

Say you’re a statistician, and you’ve been given a raw data file and asked to “analyze the data”. There are 20 variables in the data they’re labeled var1 to var20. You have no other information, so really all you can do is run some descriptive stats, look at patterns of covariance, clustering and so on.

You request some background, and are told the client wants a predictive analytics model. var1 is the dependent (target) variable. That’s still not much information but narrows it down a little. You should be able to build some kind of predictive model. What kind will depend on your background, experience and personal judgment.

Instead, say you’re told the data come from a market research survey, and asked to perform key driver analysis. Key driver is causal analysis, and many models are possible, even if you use the same type of statistical technique. You need to know the meaning of var1 – var20 and much more background, including the purpose of the analysis, who will use the results. This time, you’re given a lot of background.

Oops! There’s been a mixup. Now, you’re told the data come from a labor economics study. The variables mean different things. The background is entirely different. The decision makers are different.

Being data driven can be dangerous…


There is also the distinction between primary and secondary research. It is huge, and statisticians usually must also be able to assist in designing research to earn a living. We don’t just analyze data. This short article points out some of the key differences between primary and secondary research, from a marketing research perspective –

Bayesian Statistics offers a rationalist theory of personalistic beliefs in contexts of uncertainty, with the central aim of characterizing how an individual should act in order to avoid certain kinds of undesirable behavioral inconsistencies. The theory establishes that expected utility maximization provides the basis for rational decision making and that Bayes’ Theorem provides the key to the ways in which beliefs should fit together in the light of changing evidence. The goal, in effect, is to establish rules and procedures for individuals concerned with disciplined uncertainty accounting. The theory is not descriptive, in the sense of claiming to model actual behavior. Rather, it is prescriptive, in the sense of saying “if you wish to avoid the possibility of these undesirable consequences you must act in the following way.” – Bayesian Theory (Bernardo and Smith) (emphasis added)

I think there is some confusion between theories which attempt to describe how humans make decisions and theories about how we should make decisions (descriptive versus normative). Furthermore, how we actually decide and theoretical notions about ideal decision-making diverge, but perhaps not as much as is sometimes claimed. Not to mention that individuals differ in the way they make decisions in similar circumstances.

Martin Peterson’s An Introduction to Decision Theory is good overview of normative decision theory.

Marketing researchers and businesspeople will draw causal implications without running randomized experiments. We have to. We all do this in our daily lives, too. This is human reality, and there’s no way around it.

However, thinking as a marketing researcher, I feel most of us could do a better job. We are supposed to be researchers, after all, and there is a vast body of literature on how to analyze causation using non-experimental data we should not ignore.

Two GreenBook articles, Causation in a Nutshell and Causation: The Why Beneath The What may be helpful if you’d like to learn more about this subject.

The familiar OLS regression is just one kind of regression.

I’ve just tried counting the kinds of regression I know of and I lost count. There are nearly two-dozen regression models for count data alone that I know of, for example, and probably many more I don’t.

I will not mention mixture modeling, splines or kernels. I won’t point out that maximum likelihood estimation comes in many flavors or say “Bayes.” I will not mention that neural nets are really other ways to do regression.

I hope I’ve gotten my point across. 🙂

Before PowerPoint and other presentation tools came along, data from hard copy computer tables were retyped and made into transparencies for overhead projectors.

Graphics software was limited and slow, and simple graphs were sometimes drawn by hand. Black and white were normally the only colors available. Qualitative reports usually consisted of some summary points and illustrative verbatims.

Needless to say, presentations and reports were not so pleasing to the eye. What the researcher said or wrote and the way s/he expressed it did matter a great deal, however.

We’ve come a long way since then. Unfortunately, many reports and presentations today look great but say little or conceal slipshod research. Progress has its downsides, too.


Article by channel:

Read more articles tagged: Analytics, Featured

Data & Analytics