Is Data Science a House of Cards?

October 5, 2017

Many criticisms have been levelled at data science. Here are a few of them and my responses. Data science is nothing new. In view of all the hype surrounding data science, I feel there is some truth to this. Data science in its current state of development is a mix of the new and the old.

In Data Science and Marketing Research, What the Heck is “Data Science” Anyway? and Data Science and Data Science Fiction, I respond to this claim more fully.

Since predictive analytics is an important part of data science, this is an important criticism. Since predictive analytics is part of what I do for a living, I am a biased source, but based on what I hear from clients and other data scientists, and the academic literature, it does pay off.

However, there often have been inflated expectations and many instances of poor implementation, for example, investing a ton of money in data infrastructure that is mainly useful for simple SQL queries, not more sophisticated analytics.

Also, many data scientists seem inexperienced and poorly trained in data analysis, and fundamental errors are not rare. Hiring decisions for data science are frequently made or influenced by people with no hands-on experience in data science, and there can be an echo chamber effect between them and recruiters, who have seldom been data scientists either.

Predictive analytics is often used for targeting, which Byron Sharp and his colleagues at the Ehrenberg-Bass Institute have harshly criticized. Their research suggests targeting is often a bad strategy and, if they are correct, significant investment has been squandered on data science.

Privacy legislation and growing concerns about ad fraud will kill off a substantial chunk of data science

This should not simply be dismissed as a possibility. However, though not an expert on either of these topics, my sense is that there is too much money at stake and too little concern on the part of most consumers regarding their personal privacy for either of these to pose a serious a threat to data science. In years gone by, a Ralph Nader for privacy might have emerged but, my sense is those days are gone.

Most managerial decisions are heavily influenced by organizational politics and usually made by the gut. Data science will never play a significant role in decision making.

In my experience, and that shared by some colleagues and academic contacts, this criticism has teeth. It is a source of frustration for many data scientists and has been a pet peeve of statisticians for many years. Most decision makers, even at C-level, have had little training in statistics beyond Stats 101. How many CFOs have a strong background in econometrics and have actually worked as econometricians? Not many, I would guess.

I should note, however, that data and analytics are increasingly used for tactical decisions, and more and more these decisions are being automated. What implications this will have on overall demand for data scientists in the future is unknown but it surely will affect demand for certain kinds of data scientists.

Management fads of one sort or another are still the rage, and exerting a harmful influence on organizational structures and culture in my view. They often waste time and investment, and encourage bad decisions. Senior management in many organizations is under pressure to impress shareholders and dazzle the press, and tends to stampede in one direction or another, diluting the effectiveness of competition in weeding out bad ideas and rewarding good practice. Big data and AI may be the latest fad. It’s going to take time and educational reform for data and analytics to be integrated into decision making in the way many feel it should and to the extent many feel it should.

What about the new SMEs? Aren’t they different? Nowadays, start-ups are frequently headed by software engineers. Regardless of their educational background, successful entrepreneurs, in my experience, have good gut instincts, are naturals at selling, have boundless energy and excel at networking. I’m not so sure any this was much different in ancient Rome – you either have it (and a bit of luck) or you don’t, and formal education has little influence on this.

All this said, however, more data and more complex models – including AI – are not guaranteed to produce better decisions and, in fact, can be inferior to the simple decision heuristics that lie hidden in what we often call gut feel. Gerd Gigerenzer and his colleagues have done extensive research on how humans make decisions and are a good source for guidelines on what works and what doesn’t work in various kinds of situations. Behavioral economists have a somewhat different take on this subject but also challenge the stated or implicit notion that the best decision can essentially be calculated provided we have enough data and the right algorithm. In Who Cares About Evidence and a few other places I offer my thoughts about the role of data science in decision making.

Data science success stories nearly always focus on tech giants such as Google and Amazon, but only a tiny fraction of organizations will ever have access to or need these volumes of data.

“Traditional” giants such as GM, Prudential and P&G have long had access to masses of data and plenty of numerate and analytically-inclined people, such as economists, actuaries and engineers throughout their organizations. IBM and HP are not new companies, either. Most multinationals have been using data science of one form or another for decades. However, this criticism has merit in my opinion – neither the new nor the traditional giants are representative of most organizations.

More Volume, Variety and Velocity – the 3Vs of big data – do not guarantee more value

Beyond doubt this is true, and something statisticians have long warned about. More may be worse, in fact. What is Big Data? and The Value of Data Dredging dig a little deeper into this subject. Two clear beneficiaries of big data (broadly defined) are companies and consultants involved in the building and maintenance of data infrastructures, and I would urge potential buyers to look carefully before they leap into a large investment. I am not saying big data is all BS but, rather, that organizations new to data science should proceed cautiously and systematically in building a data management and analytics infrastructures.

AI and automation will eliminate most data science jobs before long

Human decision making is rarely mechanical or as simple as 2 + 2 = 4, and uncertainly is nearly always present in human affairs. Dealing with uncertainty in a systematic fashion lies at the center of what statisticians do. Going back to the days of John Graunt and Edmond Halley in the 17th century and perhaps even earlier, this has been part of our core mission. But statistics is seldom cut and dried either, and a statistician’s experience and instincts play a prominent role in the models we build and the recommendations we make. A good statistician needs a good gut.

While some data science positions will be eliminated by automation, demand for competent, experienced statisticians most likely will grow. In Some Things AI Can Do and Some Things It Can’t, Why I Am Not Afraid of The Machines and a few other brief articles I set out my thoughts about automation and AI in greater detail. As an aside, it is highly educational, and often surprising, to hear what true authorities on AI such as Roger Shank and Geoff Hinton have to say about AI. We’ve been down this path before.

Most data scientists aren’t good at data analysis

Collecting and storing data is quite different from data analysis, both in terms of the skills needed and philosophical outlook. “I love data!” suggests a very different mindset from “Let’s see what these data may tell us.” Statistical thinking, which the second statement reflects, is still largely missing in data science. In my view it is the glue that binds data, analytics and decisions more closely together. It is a way of thinking and an approach to problem solving that is useful even if one knows little about the technical details of statistics.

Much more than better math and programming skills will be necessary to move data science to a higher level in my opinion, and fundamental change in our educational systems will be required. The American Statistical Association, for one, agrees and is making learning materials available to K-12 teachers. Statistical thinking does not come naturally to humans and I suspect it will be another generation before it is readily incorporated into decision making. The earlier in life we get started, the better.

Lies, damned lies, and statistics

Numbers can enlighten or deceive, and though this comes as a surprise to many, scientists often have little grasp of statistics. This also holds for most lawmakers, jurists and bureaucrats, as well as those in the entertainment business. A good understanding of statistics makes us less susceptible to trickery and better able to spot scientific incompetence and pseudoscience.

Data science is a house of cards

Most hype about it has little foundation in reality so, in that sense, it is a house of cards. The secret to making data science work for you is to understand what it really is.

Arrange a Conversation

Browse

Article by channel:

Everything you need to know about Digital Transformation

Read more articles tagged: Analytics, Data Visualisation, Featured

Data & Analytics

Popular Now

Related Articles