Sampling - a snapshot - The Digital Transformation People

July 28, 2020

Sampling is a deceptively complex subject, and one not every marketing researcher, statistician or data scientist can get enthused about. Nevertheless, it is a critical one with applications in any discipline.

Newcomers to the subject are often quite surprised to discover that very small samples – tiny data by today’s standards – are sufficient for many purposes. I remember being one who was shocked.

Sampling is used when collecting data on every member of the population – a census – would either be too costly or take too much time. Surveying all car owners in the US, for instance, would not be feasible for most marketing research agencies.

Building a predictive model using all customers in a data warehouse might be technically possible but could slow down the modeling process considerably. Fortunately, a sample of customers will often suffice.

There are two basic kinds of samples: probability and non-probability. With probability sampling, each unit (e.g., consumer) has a known, non-zero chance of being selected. This is often called random sampling, since some form of random or (systematic) selection mechanism is employed. Fieldwork staff do not choose who participates in the research and who does not.

With non-probability sampling, some elements of the population have no chance of selection, or the probability of their selection cannot be precisely determined. In the case of mall and street intercepts, fieldwork staff do choose who participates in the research and who does not. These are non-probability samples.

Simple random sampling (SRS) and systematic (“every nth”) sampling are the sampling procedures most of us would probably think of when we hear “random sample.” There are also stratified and cluster samples, among many other kinds.

In cluster (or multi-stage) sampling, we take samples of smaller units within larger units. For example, we might sample geographic areas, and then housing units and, finally, individuals within these housing units. Stratification entails breaking down the target population into segments, such as age group or gender, before sampling and then taking independent samples within each stratum. Quota sampling is a non-probability method that resembles stratified sampling except that selection of units is at least partly judgmental.

There is an important but often overlooked distinction between probability sampling and a probability sample. A research agency may utilize a probability sampling procedure for a consumer survey, for instance, but because many people invited to join the survey refuse to do so the sample the research agency obtains is not a true probability sample.

If the differences between this sample and those refusing to participate are small, then this self-selection won’t matter much and the respondents can be treated as a probability sample with negligible risk. If research agency had this information, however, there would be no need to conduct the survey in the first place.

Post-survey weight adjustments can be used in many situations to make the actual sample represent the target population more closely. Marketing researchers often weight survey data by age, gender, region or other variables for which national census data are available.

Weighting cannot transform a non-probability sample into a probability sample, though, and we can only try to make our sample more representative of the population. Weighting can be tricky and is not a panacea.

Contrary to what some may believe, samples do not have to be drawn from national populations – the population could be customers in our database, for example. Samples need not be humans, either. They could be widgets in our warehouse, Great Whites in the Atlantic, or statisticians on Zetar.

Most statistical procedures by default assume simple random sampling, which means significance tests and confidence intervals will be incorrect if another sampling method had been used. They will also be invalid in the case of non-probability sampling, which is typical in marketing research.

Population definition is also critical, and this an area where questionable practice in marketing research is common. One example is very narrowly-defined “target” populations based more on judgment than data.

In this article, Stas Kolenikov gives us a succinct overview of sampling. Below I’ve listed some books for those who’d like an in-depth look at this subject, the first three of which are generally regarded as classics:

Survey Sampling (Kish)
Sampling Techniques (Cochran)
Model Assisted Survey Sampling (Särndal et al.)
Sampling: Design and Analysis (Lohr)
Practical Tools for Designing and Weighting Survey Samples (Valliant)
Survey Weights: A Step-by-step Guide to Calculation (Valliant)
Complex Surveys (Lumley)
Hard-to-Survey Populations (Tourangeau et al.)
Small Area Estimation (Rao and Molina)

Sharon Lohr’s is perhaps the most accessible of those on this list, which is not comprehensive. Once you’re comfortable with the fundamentals of sampling, you may also find these two journals helpful:

Public Opinion Quarterly (AAPOR)
Journal of Survey Statistics and Methodology (AAPOR and ASA)

The Stata software’s Survey Data Reference Manual, which is freely-downloadable, may also be of interest even if you use different statistical software.

Though many of its fundamentals were developed decades ago, sampling is by no means a settled topic and is an area of on-going research. I hope you’ve found this snapshot interesting and useful!

Arrange a Conversation

Browse

Article by channel:

Everything you need to know about Digital Transformation

Popular Now

Related Articles