Big Data or Big BS?

The hype about Big Data may be a lot bigger than the reality. Or maybe not. “Big Data” brings to mind an old Soviet dissidents’ joke…

Ivan: “I have a medical problem. Do you know a good eye and ear doctor?” 
Peter: “You mean an ear, nose and throat doctor?”
Ivan: “No, an ear and eye doctor. I keep hearing one thing and seeing another.”  
Not a small amount of what is called Big Data or data mining and predictive analytics has been part of a corporate statistician’s job for decades. Often it has been the major part. The data warehousing boom of the ‘90’s mostly expanded and extended existing practice, and Big Data was the next step in this progression.

Many statisticians will recall Clementine, the data mining and predictive analytics package SPSS acquired and promoted heavily in the latter part of the ‘90s. SAS and Statistica have also been in this space for years and much of SAS’s business has been driven by corporate users since the ‘80s. Artificial Neural Networks (“Neural Nets”) was included in these packages and featured heavily in their promotion. This was before practically everything was called “AI”. Let’s also not forget that SQL has been around since the 1970s and that Mark IV was in use even earlier.

When I make these points, those selling Big Data technology will often simply ignore them and trot out Twitter, geolocation and the usual suspects. After all, part of being an effective sales person is to brush aside evidence that gets in the way of the sales pitch. I’m sorry Mr. Salesman, but a doctor in the 1980s was still a doctor, even though medicine has advanced since then.

Data and analysis of data have, in some form, been used to aid decision making since ancient times. No organization is truly data-driven, though, and evidence-based decision making remains the exception rather than the rule. So why, after all these centuries are data and analytics not more embedded in corporate decision making?

One obvious reason is that mathematics does not come easily to most humans, and modern statistics is still quite recent. There just aren’t that many R.A. Fishers around, let alone Newtons or Laplaces. Kolmogorov published his treatise Foundations of the Theory of Probability in 1933 and only died in 1987. Intellects of that level are in the history books, not in the cubicle next to ours.

Another reason is simply that until about 20 years ago, the technology for data collection, processing and analysis was slow, clumsy and required highly-trained specialists to use properly. Some readers may recall punch cards and magnetic tape. Also, critical data decision makers needed, such as sales figures, were unobtainable or not readily available. Somewhere in the world, there surely is one company still using annual factory shipments as sales data.

Change takes time, both to psychologically adapt, and to learn how to use new technologies productively. While there certainly is risk in doing things the old way year in and year out, reckless change can have costly consequences and can destroy careers. Constant buzz and chatter, much of it nonsensical and contradictory, does not help matters. It confuses many people and squanders budget on “innovations” that turn out to be…BS.


Change can be threatening, especially to senior managers who’ve built their careers on good gut instincts. New technology also can be put to the wrong use by skillful political operators within organizations, who see it as a tool to advance their own careers rather than to benefit the organization. Some resistance to change is not unreasonable!

From my vantage point, data is not as big as often claimed in the sense of its significance in decision making in most organizations. It is playing a more prominent part, though, and many lower-level clerical and blue-collar jobs have been automated. Automation is moving up the chain of command and chatbots and expert consultants that heavily utilize AI – or are AI – will become a larger part of our daily lives and jobs. For better or for worse, I envision machine largely replacing man as the new faceless bureaucrat. Even house pets may be at risk of losing their jobs…”Bad dog!” will have more bite to it.    

What about the in-between decision space where decisions can’t be automated, at least not fully, but where data and data analysis can enhance the quality of decisions? This is my island in the business world and I sense slow, bumpy progress. This will be the hardest nut to crack, since data and analytics can be misused or become another arrowin the quiver of the politically skillful, as noted.

Moreover, emotions and expediency easily overwhelm science – even within science itself. Masses of data and advanced statistical analysis seldom can reduce uncertainly to the point where there is no wiggle room for decision makers, or eliminate the need for tough decisions that may place the careers of decision makers in jeopardy.

My approach, which, admittedly, is not always feasible, is to learn as much as I can about how decisions are made within the client organization. Not necessarily at a high, abstract level, though understanding corporate culture can help a lot, but with respect to specific decisions that motivated an RFP or general query that comes my way.

How do I do this? There is no special technique I use; I simply (though tactfully) ask and explain why I’m asking. Sometimes I conclude there is no need for me, and I am upfront about this. These are weird times but we have not descended to the point where forthrightness is never appreciated.

Clearing up misconceptions are an important part of my work besides the modeling itself. Most decision makers have had little education in analytics beyond Stats 101, and significant misunderstandings regarding data and data analysis are commonplace. Some tasks are easier to accomplish than many realize, while others are much harder or impossible even with today’s technology. Whether it’s called statistics or machine learning, it isn’t magic, despite the hype.

Put simply, I see decision making increasingly falling into three categories, or some combination of the three:

  1. Automated decisions of varying degrees of complexity;
  2. Analytics-enhanced decisions in which statisticians and other human data analysts play a key role; and
  3. Human decisions, in which gut instinct remains significant.

So, is it Big Data or Big BS? A lot of both, I sense.

Arrange a Conversation 


Article by channel:

Read more articles tagged: Big Data, Featured