Random Thoughts for June 1, 2018

June 1, 2018

Qualitative research is often criticized by quants for being fuzzy and that different moderators/analysts would have a different interpretations of the same discussions or interviews.

Well, a Deep Learning model is just one of many possible Deep Learning models, and an ensemble of 30 machine learners and statistical models is, well, an ensemble of 30 machine learners and statistical models.

The TOC of Kevin Murphy’s popular book on machine learning is another possible response to this criticism:

https://lnkd.in/d5tEtwk

At times, computer scientist and causality guru Judea Pearl almost appears to suggest that, by drawing a directed acyclic graph (a form of path diagram), we can make reality disappear. Hard core quants can be questioned, too. 🙂

Statistical modeling typically involves many trade-offs. For instance, the “best” model according to established statistical criteria such as the BIC may be difficult to interpret or even counter-intuitive. In the words of the great statistician George Box

“Essentially, all models are wrong, but some are useful.”

The next time someone tells you statistics doesn’t work because it assumes normally distributed data, show ’em this. 🙂

Like nonparametric, multilevel, hierarchical, and many other terms, “nonlinear” is often used loosely (and incorrectly) by statisticians and data scientists. Unfortunately, I am not an exception. Here are some of the things it can mean:

Interaction (moderated) effects, in which the relationship between variables depends on other variables
Curvilinear relationships between variables, e.g., the relationship between Y and X is not straight-line
Ordinal variables, i.e., categorical variables in which the categories have a natural order but the differences between pairs of categories cannot be assumed equal
Equations in which the change in an outcome (dependent variable) is not proportional to the change in the predictors (independent variables), and the equation cannot be “linearized”

When neural nets and other machine learning tools were first marketed aggressively in the late ’90s, the claim was often made that statistical methods were unable to cope with “nonlinearity.” I still occasionally hear this claim, which is incorrect.

“Experience has shown, and a true philosophy will always show, that a vast, perhaps the larger portion of the truth arises from the seemingly irrelevant.” – Edgar Allan Poe, The Mystery of Marie Roget

Automatic screening of cross tabulations as important or unimportant based on statistical significance or other arbitrary criteria is risky. Very small differences between consumer groups or KPI trends may – and often do – have important implications for management.

Moreover, data can be used in various ways, including to check other data and to create new variables. Subject matter knowledge should play a role when deciding whether or not to delete a variable, or when deciding if and how to re-code it.

Trying to make decisions based on numbers alone will often get you into trouble. I’ve learned the hard way!

“Unless what we do be useful, vain is our glory.” – Phaedrus

Like many fields, marketing research now has access to volumes and varieties of data at a velocity few would have imagined a decade or two ago. Looking at this as a statistician and researcher, I’m quite pleased.

That said, I would like to see less focus in the MR community on DATA, and more interest in how to use it, beyond vague discussions and sales pitches about visualization, AI and machine learning.

Arrange a Conversation

Browse

Article by channel:

Everything you need to know about Digital Transformation

Read more articles tagged: Analytics, Featured, Statistics

Data & Analytics

Popular Now

Related Articles