In his catchy tune The Purpose of Existence Is? the former Doors organist poses some fundamental questions to which definitive answers remain elusive. Happily, the purpose of statistics is easier to fathom since humans are its creator. Put simply, it is to enhance decision making.
“Do you think the purpose of existence is to pass out of existence is the purpose of existence?” – Ray Manzarek
These decisions could be those made by scientists, businesspeople, politicians and other government officials, by medical and legal professionals, or even by religious authorities. In informal ways, ordinary folks also use statistics to help make better decisions.
How does it do this?
One way is by providing basic information, such as how many, how much and how often. Stat in statistics is derived from the word state, as in nation state and, as it emerged as a formal discipline, describing nations quantitatively (e.g., population size, number of citizens working in manufacturing) became a fundamental purpose. Frequencies, means, medians and standard deviations are now familiar to anyone.
Often we must rely on samples to make inferences about our population of interest. From a consumer survey, for example, we might estimate mean annual household expenditures on snack foods. This is known as inferential statistics, and confidence intervals will be familiar to anyone who has taken an introductory course in statistics.
So will methods such as t-tests and chi-squared tests which can be used to make population inferences about groups (e.g., are males more likely than females to eat pretzels?).
Another way statistics helps us make decisions is by exploring relationships among variables through the use of cross-tabulations, correlations and data visualizations. Exploratory data analysis (EDA) can also take on more complex forms and draw upon methods such as principal components analysis, regression and cluster analysis. EDA is often used to develop hypotheses which will be assessed more rigorously in subsequent research.
These hypotheses are often causal in nature, for example, why some people avoid snacks. Randomized experiments are generally considered the best approach in causal analysis but are not always possible or appropriate; see Why experiment? for some more thoughts on this subject.
Hypotheses can be further developed and refined, not simply tested through Null Hypothesis Significance Testing, though this has been traditionally frowned upon since we are using the same data for multiple purposes.
Many statisticians are actively involved in designing research, not merely using secondary data. This is a large subject but briefly summarized in Preaching About Primary Research.
Making classifications, predictions and forecasts is another traditional role of statistics. In a data science context, the first two are often called predictive analytics and employ methods such as random forests and standard (OLS) regression. Forecasting sales for the next year is a different matter and normally requires the use of time-series analysis.
There is also unsupervised learning, which aims to find previously unknown patterns in unlabeled data. Using K-means clustering to partition consumer survey respondents into segments based on their attitudes is an example of this.
Quality control, operations research, what-if simulations and risk assessment are other areas where statistics play a key role. There are many others, as this page illustrates.
The fuzzy buzzy term analytics is frequently used interchangeably with statistics, an offence to which I also plead guilty.
This has just been a snapshot of statistics, but I hope you have found it useful. For those seeking more information, many books journals are listed here.
“The best thing about being a statistician is that you get to play in everyone’s backyard.” – John Tukey
Article by channel:
Everything you need to know about Digital Transformation
The best articles, news and events direct to your inbox