Why does science get things wrong?

September 28, 2020

Many widely-held scientific theories were later proven badly wrong, as this brief article shows. How can this happen? The answer is in some ways simple and in other ways not so simple. First, science is still evolving and our understanding of many basic phenomena remains far from complete.

Another reason is that science – at least on our planet – is conducted by humans and we humans have many foibles. Biases of various sorts, funding conflicts, egos, and sheer incompetence are some of the very human things that can undermine research.

Scientists sometimes get it right but those reporting it get it wrong. Few journalists have worked as scientists and most have had no more training in science than most of their readership. To be fair, though, many scientists have had limited coursework in research methods and statistics, as I point out in Statistical Mistakes Even Scientists Make.

Peer review can sometime resemble chum review and, moreover, some studies make the front page without having been peer reviewed at all. Even when they have been, few editors and reviewers of scientific publications are statisticians – there aren’t many statisticians – and they have their day jobs, as well. In “softer” fields standards are arguably even less rigorous.

I am neither a scientist by the usual definition nor a scholar by any definition, but as an applied statistician who, to paraphrase John Tukey, has played in many backyards for many years, I can suggest a few reasons why science can go wrong.

Besides what I’ve previously mentioned, topping my list is Null Hypothesis Significance Testing (NHST). Many of you will remember this from an introductory statistics course. While on the surface it may seem straightforward, NHST is widely misunderstood and misused. The American Statistician has devoted a full open-access issue to this and associated topics.

Put simply, one important concern is that findings with p values greater than .05 are less likely to be accepted for publication, or even submitted for publication, than those with statistically significant findings. This is known as publication bias or the file drawer problem.

Common sense, one would think, should tell us that negative findings are just as important as statistically significant results, but many potentially important research results apparently never see the light of day. Since accumulation of knowledge is the essence of science, this is a serious problem which only recently has been getting the attention many statisticians have long felt it warranted. Statistical significance is not the same as decision significance, either.

A second reason is that small sample studies are common in many fields. While a gigantic sample does not automatically imply that findings can be trusted, estimates of effect sizes are much more variable from study to study when samples are small. Conversely, trivial effect sizes with little clinical or business significance may be statistically significant when sample sizes are large and, in some instances, receive extensive publicity.

Non-experimental (observational) research is prevalent in many disciplines and in the era of big data seems to be experiencing a boom. While it is true that randomized experiments are not always feasible or ethical, this does not mean that non-experimental research is therefore sufficient. I summarize some of these issues in Propensity Scores: What they are and what they do and Meta-analysis and Marketing Research. Put simply, effect size estimates are generally more variable – less reliable – in non-experimental research. Back to publication bias again…

“Mine until you find” seems to be the motto of some researchers, and this is a manifestation of a particularly dangerous form of malpractice known as HARKing. Observational research, unfortunately, is quite often abused in that way.

That said, laboratory experiments can be highly artificial and may not generalize to real-world conditions. Why Experiment? summarizes some of the pros and cons of randomized experiments.

Misuse of statistics is not unheard of even in “hard” science, as noted earlier. This may reflect poor training or poor ethics. One manifestation is the way covariates (independent variables) are used, particularly in observational studies. In some circumstances, effect size estimates can change substantially depending on which “control” variables are used and how they are used.

t-tests and ANOVA are widely used in both experimental and non-experimental settings. Both make quite strong assumptions about the data and many papers do not indicate that these assumptions have even been examined. Psychometrician Rand Wilcox and others have pointed out in great detail the risks of this form of statistical malpractice. Applying an improper type of statistical model is another popular sin, e.g., OLS linear regression when the data are time series or when the outcome (dependent variable) is a count.

Tens of thousands of studies are conducted each year around the world, which means there would be a lot of “bad” science even if standards were uniformly high.

One reason scientists get things wrong is very simple: Science is hard..

I hope you’ve found this interesting and helpful!

Arrange a Conversation

Browse

Article by channel:

Everything you need to know about Digital Transformation

Read more articles tagged: Analytics, Featured

Data & Analytics

Popular Now

Related Articles