The Book of Why

The Book of Why

UCLA computer scientist Judea Pearl has made noteworthy contributions to artificial intelligence, Bayesian networks, and causal analysis. These achievements notwithstanding, Pearl holds some views on data many statisticians may find odd or exaggerated.

Here are a few examples from his latest book, The Book of Why: The New Science of Cause and Effect, co-authored with mathematician Dana Mackenzie. 

“Causality has undergone a major transformation…from a concept shrouded in mystery into a mathematical object with well-defined semantics and well-founded logic. Paradoxes and controversies have been resolved, slippery concepts have been explicated, and practical problems relying on causal information that long were regarded as either metaphysical or unmanageable can now be solved using elementary mathematics. Put simply, causality has been mathematized.” 
“Despite heroic efforts by the geneticist Sewall Wright (1889–1988), causal vocabulary was virtually prohibited for more than half a century…Because of this prohibition, mathematical tools to manage causal questions were deemed unnecessary, and statistics focused exclusively on how to summarize data, not on how to interpret it.” 
“…some statisticians to this day find it extremely hard to understand why some knowledge lies outside the province of statistics and why data alone cannot make up for lack of scientific knowledge.” 
“Even if we choose them at random, there is always some chance that the proportions measured in the sample are not representative of the proportions in the population at large. Fortunately, the discipline of statistics, empowered by advanced techniques of machine learning, gives us many, many ways to manage this uncertainty—maximum likelihood estimators, propensity scores, confidence intervals, significance tests, and so forth.” 
“…we collect data only after we posit the causal model, after we state the scientific query we wish to answer, and after we derive the estimand. This contrasts with the traditional statistical approach…which does not even have a causal model.” 
“If [Karl] Pearson were alive today, living in the era of Big Data, he would say exactly this: the answers are all in the data.” 
“Statisticians have been immensely confused about what variables should and should not be controlled for, so the default practice has been to control for everything one can measure. The vast majority of studies conducted in this day and age subscribe to this practice. It is a convenient, simple procedure to follow, but it is both wasteful and ridden with errors. A key achievement of the Causal Revolution has been to bring an end to this confusion. At the same time, statisticians greatly underrate controlling in the sense that they are loath to talk about causality at all, even if the controlling has been done correctly. This too stands contrary to the message of this chapter: if you have identified a sufficient set of deconfounders in your diagram, gathered data on them, and properly adjusted for them, then you have every right to say that you have computed the causal effect X->Y (provided, of course, that you can defend your causal diagram on scientific grounds).” 
“…until recently the generations of statisticians who followed Fisher could not prove that what they got from the RCT [Randomized Control Trials] was indeed what they sought to obtain. They did not have a language to write down what they were looking for—namely, the causal effect of X on Y.” 
“The very people who should care the most about ‘Why?’ questions—namely, scientists—were laboring under a statistical culture that denied them the right to ask those questions.” 
“…statistical estimation is not trivial when the number of variables is large, and only big-data and modern machine-learning techniques can help us to overcome the curse of dimensionality.” 

Pearl also has some harsh words for data science:

“We live in an era that presumes Big Data to be the solution to all our problems. Courses in ‘data science’ are proliferating in our universities, and jobs for ‘data scientists’ are lucrative in the companies that participate in the ‘data economy.’ But I hope with this book to convince you that data are profoundly dumb. Data can tell you that the people who took a medicine recovered faster than those who did not take it, but they can’t tell you why.” 
“…many researchers in artificial intelligence would like to skip the hard step of constructing or acquiring a causal model and rely solely on data for all cognitive tasks. The hope—and at present, it is usually a silent one—is that the data themselves will guide us to the right answers whenever causal questions come up.” 
“Another advantage causal models have that data mining and deep learning lack is adaptability.” 
“Like the prisoners in Plato’s famous cave, deep-learning systems explore the shadows on the cave wall and learn to accurately predict their movements. They lack the understanding that the observed shadows are mere projections of three-dimensional objects moving in a three-dimensional space. Strong AI requires this understanding.” 
“…the subjective component in causal information does not necessarily diminish over time, even as the amount of data increases. Two people who believe in two different causal diagrams can analyze the same data and may never come to the same conclusion, regardless of how ‘big.’ the data are.” 

Though he can come across as an Ivory Tower academic whose arguments at times are muddled and contradictory, I suspect few seriously interested in the subject of causation consider Judea Pearl or his work bland or irrelevant. He is always thought-provoking and has much to say that should be heeded, and has provided statisticians and researchers with another set of useful tools for causal analysis. He should also be read because of his influence. In particular, I would recommend his Causality: Models, Reasoning and Inference to researchers and statisticians, though The Book of Why is a gentler introduction to his thinking.


He has not revolutionized the analysis of causation, though, and as noted, many statisticians will probably find at least some of his opinions out of sync with their own perceptions of statistics and their professional brethren, as well as the history of their discipline. He makes many generalizations regarding what “all or most statisticians” would do in a given situation, and then shows us that this would be wrong. He offers no evidence in support of these generalizations, many of which strike me as what a competent statistician would not do. Likewise, some of what he claims is new or even radical thinking may cause some statisticians to scratch their heads since it’s what they’ve done for years, though perhaps under a different name or no name at all.

His charge that statisticians are focused on summarizing data, ignore the data generating process, and are uninterested in theory and causal analysis is particularly amusing in light of the at times acrimonious discussions between statisticians and data scientists from other backgrounds. He also disregards the myriad complex experimental designs and analyses of data obtained through these designs via ANOVA and MANOVA – which also can become quite complex – that explicitly consider a causal framework. These designs as well as ANOVA and MANOVA have been in use for many decades. Related to this, historically, statisticians have specialized in particular disciplines, such as agriculture, economics, pharmacology and psychology, because subject matter expertise is necessary for them to be effective – statistics is not just mathematics.

More fundamentally, all statisticians are not equally competent or will even define competence in the same way. There are also academic statisticians and applied statisticians and often wide gulfs between the two. All of this is certainly true of just about any profession. Not all lawyers are ambulance chasers, and practicing attorneys are not clones of their former law school professors.

There is also the sensitive matter that the advice of statisticians has often been ignored by researchers in many fields, and that this frequently is the reason for dubious practices in these disciplines, not deficiencies of statistics itself. For example, statisticians have frequently advised researchers against drawing causal implications based on correlations alone in the absence of sound theory and a causal model based on this theory, often to no avail. Note that this is very different from claiming correlation is irrelevant or that it means no causation is present. It differs starkly from believing causation is immaterial and should not be explored. Statisticians also contribute to the design of research and lengthy discussions about causation are not unusual. Most researchers are interested in the why and statisticians who aren’t are an endangered species.

Furthermore, practitioners, who enormously outnumber Ivory Tower statisticians, are sometimes given data with little background information and asked to find something “interesting.” In effect, they are ordered to data dredge. In these circumstances, any number of ad hoc causal models created manually or with automated software may fit the data about equally well but suggest very different courses of action for decision-makers. This can turn into a tightrope walk. More commonly, they may be given a set of cross tabulations and asked for their interpretation. It could be that something in the data does not make sense to the researcher or that s/he wants a second opinion.

This is a snapshot of the real world of a statistician as I see it, and it is a very different world from the one Pearl sees.

My comments should be not interpreted as suggesting that none of Pearl’s criticisms of statistics and statisticians have merit, though I do find many bizarre. (To be clear, he does hold some statisticians in high regard.) Like Pearl himself, statisticians shouldn’t be deified and surely a few decades ahead some of what is now widely-accepted as good statistical practice will be widely-regarded as boneheaded. Like any field, statistics evolves, often slowly and unsteadily.

The Great Depression, WWII, the Korean War and the Cold War surely had some impact on its historical development. Let’s also not forget that it was not that long ago when calculations were done manually, statisticians had limited empirical data to work with and were unable to conduct Monte Carlo studies, now essential for many statisticians. Still, I agree with Pearl that statistics and causal analysis would have progressed more rapidly if Sewall Wright’s contributions, especially path analysis, had received the attention they merited.

To sum up, there is much real wisdom in Pearl’s writings and, for what my opinion is worth, I would urge statisticians and researchers to read him. In The Book of Why, especially, he provides many vivid examples of how to do research wrong and how to do it better. Apart from his strange opinions about statistics and statisticians, my main criticism of Pearl is that he overstates his case. For example, in my experience path diagrams and DAG are aids to causal analysis, but not essential. Some people find them confusing. He also appears little interested in sampling and data quality, or with the possibility that dissimilar causal models may operate for latent classes.

There are many other books on or related to the analysis of causation. Experimental and Quasi-Experimental Designs (Shadish et al.), Explanation in Causal Inference(VanderWeele), Counterfactuals and Causal Inference (Morgan and Winship), Causal Inference (Imbens and Rubin) and Methods of Meta-Analysis (Schmidt and Hunter) are some others I can recommend. All are quite technical. For the philosophically inclined, The Oxford Handbook of Causation (Beebee et al.) may be the first port of call. Statisticians and others with a good background in statistics may also be interested in this debate which appeared in The American Statistician in 2014.

In closing, a general observation I would offer about the analysis of causation is really a reminder that theories are often constructed (or taken apart) in small bits and pieces over many years through the hard work of many independent researchers. Frequently, these small bits and pieces can only be tested through experimentation since the requisite observational data does not exist.

Most of these experiments are only reported in academic journals and go unnoticed except to specialists working in that area. Many are quite simple and do not require sophisticated mathematics and fancy software. They do not test grand theories that have direct and sweeping implications for public health and welfare. They are the regular guys of research and the unsung heroes of causal analysis.  

Arrange a Conversation 


Article by channel:

Read more articles tagged: Analytics, Data Visualisation, Featured, Statistics