# Statistical thinking in a nutshell

One of the toughest things to explain about statistics is that it’s not just math and programming or plugging numbers into formulas. It’s also a way of thinking. Perhaps this is why Frank Harrell, an eminent biostatistician and advisor to the US FDA, has named his blog Statistic Thinking.

Misconceptions about statistics are entirely understandable. It’s a complicated subject most of us are only briefly exposed to in school and, perhaps, in an introductory undergraduate class. The American Statistical Association’s website is worth a look.

Fundamentally, statistics is a systematic means of helping us better understand parts of our world we consider important. It also helps us understand why we consider some things important and others not. It is not a substitute for philosophy but fits in neatly with it.

It is not a substitute for engineering but can be an aid to it. It does not replace gut instinct but can inform it. As an applied science, statistics plays at least some role in just about any discipline, from archaeology to zoology.

How can it do this? First, it provides guidelines for gathering and organizing basic facts. “Stat” in statistics is derived from “state” and much of the early work of statisticians pertained to government statistics…as in “lies, damned lies, and statistics.”

Baseball employed statisticians long before anyone had ever heard of Billy Beane or Moneyball. It is used extensively in medical research – pharmacology being one example – to help researchers understand the causes and possible cures of diseases. Predictive analytics is currently hot but only one corner of the discipline.

Probability lies at the core of statistical thinking, conditional probability in particular. This is important because humans have a strong tendency to think categorically.

We feel compelled to put everything into buckets, usually one bucket or another – perhaps this is a manifestation of a fight or flight instinct which is part of our evolutionary heritage. Categorizing is often useful, but when making business decisions black and white thinking can also lead us astray.

Statistical thinking also requires conceptualizing systemically, that is, recognizing that effects usually have multiple causes which interrelate with one another. Effects can also become causes.

For example, we might notice an ad and try the product but are also more likely to notice an ad for a product we have tried. Looking at total figures or two-way cross-tabulations is a preliminary step in analyzing data and stopping at that point can mislead us.

I should add that acknowledging and dealing with cognitive biases has long been a part of the scientific tradition and is also an important component of statistical thinking.

There are two sides to any coin, and Statistical Mistakes Even Scientists Make and How To Lie With Numbers reveal some of the ways statistics can be used incompetently or unethically.

Another way to define statistical thinking is a series of questions we can ask ourselves, including Am I asking the right questions? and Are there other questions I should be asking? Some others are:

• Is my hypothesis – even an informal notion – internally consistent?
• Do I have real empirical evidence to support it?
• Have I looked at all the relevant empirical evidence?
• Are there rival explanations I haven’t considered?
• Am I confusing the possible with the plausible or the plausible with fact?
• Are patterns I’ve observed in the data likely to be real, or merely due to chance? What might have caused these patterns, if real?
• Are there unobserved variables or other confounders I haven’t accounted for that may have caused these patterns?
• Am I confusing cause with effect, or correlation with causation?
• Am I drawing conclusions about fruit-based only on apples?
• Are my data of sufficient quality to justify the inferences I’ve made from them? Have I properly accounted for sample design, nonresponse, missing data, measurement error, statistical assumptions and other potential effects?
• Given A and B, if I do C and D, what are the likely outcomes? What are the likely outcomes of those outcomes?

Statistics is a very big subject, much too big for a short piece such as this. It challenges us to question our own thinking in many ways. It encourages us to question our assumptions, the data we have used, our analyses of the data, our interpretations of our analyses, and the implications we have drawn from these interpretations.

In short, statistical thinking encourages us to prove ourselves wrong and discourages black and white thinking.

I hope you’ve found this interesting and helpful!

Arrange a Conversation

Browse

Article by channel:

Read more articles tagged: Analytics, Featured