Explanation, Prediction and Simulation

Terminology statisticians and data scientists use can confuse just about anyone. One reason is that the same jargon can mean, or imply, different things depending on who is using it and what they’re describing. Our comprehension is further clouded by articles and blogs written by people with limited expertise on the subject they are writing about or commenting on. Some gurus can do, while others can only guru.

Explanation, prediction and simulation…I will not cite official definitions of these terms – there are many – but instead will use simple examples to illustrate what I mean by them.

Demographics exert an influence on how customers behave. A simple example would be income, which limits what many people can buy or how often they can purchase what they buy. Income partly explains consumer behavior and is an explanatory variable. To use a more complex example, demographics, past purchase behavior and attitudes may be included in a sophisticated model that explains a lot of why some people do certain things but not others. Causal analysis is a thorny subject and I elaborate on it a bit more here.

In some cases, we simply need to make predictions. For instance, people with certain characteristics may be more likely to respond to our ads than other people, but we really don’t know why this is the case. However, we observe the same patterns over and over again, and it may not be cost effective to dig more deeply to try to explain this propensity.

Another example is that customer records are often employed to predict uptake of a new product or, conversely, churn, even though the precise mechanisms causing these behaviors may never be known to us. Time may be an issue too – for example, there may be indications our website has been hacked and we must act quickly.

Simulations use a statistical or other kind of mathematical model to refine our guesses about what will happen or, retrospectively, what might have happened under various scenarios. Conjoint analysis is an example from marketing research in which What if? simulations are frequently employed. We’ve all heard of the Wall Street “quants,” and political scientists use simulations of various kinds. Many lay persons seem either dazzled by simulations or extremely skeptical of them. My own stance is that their value is case-by-case. In marketing research, for example, Decision Support Tools (DSTs) can be very useful but often have been constructed on flimsy data, models and theory.

Explanation and prediction need not be in conflict, though some blogs seem to suggest an inevitable trade-off. In fact, it is possible to develop an accurate predictive model that is interpretable and provides a reasonable explanation of the Why underlying the What (which includes the Who, When, Where and How as I am using the word here).

We can also develop one model for prediction and a separate model for explanation, using a small sample of the data. To be useful, the explanatory model’s predictions should correlate reasonably well with those of the predictive model. Moreover, statistical methods can be employed to “explain” the predictions of a black box method, i.e., to roughly back-engineer it.


Business objectives, time constraints and limitations of the data themselves often encourage the use of semi-automated “machine learners” that predict well enough but aren’t very informative. And, let’s face it, habit also plays a big role in what humans do!

Interpretable data will often be more actionable, though not always, as noted. Why are data sometimes difficult to interpret? Here are a few of the many reasons:

  • Too many variables for any human mind to absorb and manage.
  • Conversely, too few variables, and we are only able to see a small part of the picture. Important data may be missing. 
  • Unobservable “latent” variables may exert an influence, but we may be unaware of them, or lack observable variables with which to measure them.
  • Relationships among variables may be highly complex and obscured by non-linearities, interactions, leads/lags, leaving our human minds dazed.
  • No theory, including informal notions, that can help us make sense of the patterns we observe.
  • We may not know to which population our data can be generalized.
  • We may have been misled by measurement errors or coding errors, or have misinterpreted what certain variables mean.  

Explanation, prediction and simulation are complicated topics and it’s quite easy to be confused by the jargon and intimidated by the math. Don’t be afraid to ask questions, though, if something someone has written or said doesn’t make sense. It could be that it’s wrong. Remember, there are two kinds of gurus…

I hope you’ve found this interesting and helpful!

The background photo is of Yogi Berra, baseball legend, polyglot and the greatest philosopher my nation has ever produced.



Article by channel:

Read more articles tagged: Analytics, Featured

Data & Analytics