What the Heck is “Data Science” Anyway?

There’s been a lot of chatter about data science the past few years, and even more confusion about what it really is. So, what is data science? Honestly, I don’t really know, even though one may argue I’ve been doing it for more than 30 years!

Broadly speaking, it seems to refer to people who have advanced skills in IT, programming or statistics. Data Science and Data Science Fiction represents my best guess at what it is for most people who call themselves data scientists, and here is Wikipedia‘s more scholarly definition, plus a bit of history:

Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
Data science is a “concept to unify statistics, data analysis and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.

The term “data science” (originally used interchangeably with “datalogy”) has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960.

As noted in this entry and in many other places, data science and statistics overlap considerably:

Although use of the term “data science” has exploded in business environments, many academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs. In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician….Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”

A quick online search will turn up other definitions of data science, such as INVESTOPEDIA‘s. 23 Great Schools with Master’s Programs in Data Science from mastersindatascience.org, with which I have no affiliation, provides considerable detail about data science programs and also may be of interest if you are thinking of a career in the field.

D

In Data Science and Analytics Demystified and Data Science and Marketing Research I summarize my own thoughts on this topic. Demystifying Predictive Analytics and Making Sense of Machine Learning dig a bit deeper into the analytics technology used in data science.

In the business media and blogosphere, data science seems to be increasingly connected with Artificial Intelligence (“AI”), which I find somewhat misleading. See Artificial Intelligence 101, AI, Big Data and Decisions and Some Things AI Can Do and Some Things It Can’t for synopses of my homework on AI.

The MR Realities discussion Dave McCaughan and I had with Mei Marker, a PhD computer scientist and AI specialist, was extremely enlightening. You can listen to this audio podcast by clicking on AI: Reality, Science Fiction and the Future. (No registration is needed.)

One thing for certain is that data science is now big business and, in one way or another, is heavily advertised. Much written or said about it is motivated, directly or indirectly, by commercial interests and I believe a healthy skepticism is called for.

To sum up, here are a few of my thoughts:

  • Much of data science is unrelated to marketing or marketing research
  • Big data and data science are not entirely new, nor are they merely a rebranding of something old
  • Data analysis and data management are often confused but are actually very different from one another and require different skills sets and, perhaps, different personality types
  • Data warehouses and data marts are not dead
  • Data mining and predictive analytics are still the core of data science
  • Data science is not AI though AI is an important subset of it 
  • You don’t need an M.S. in Data Science to find a data science job
  • After ignoring it for years, academic statisticians have taken a keen interest in data science and are more assertive than they were a few years ago and having more impact. Historically, most data scientists were statisticians or had strong backgrounds in statistics, and programmer/statistician was a popular job title for data science occupations.
  • Tensions between statisticians and computer scientists working in data science still exist but may be easing, which I welcome. Data science, like baseball, is a team sport and there are very few “unicorns” able to play every position competently.
  • There is a skills shortage on the analytics side of data science and perhaps on the data management side as well. There may be a time-bomb ticking as many people working in data science-related occupations are inexperienced and poorly-trained.
  • Statisticians now need to know much more than they did a decade or so ago. Who will teach and train them and how can all the material – new and old – be learned in the same space of time? Over-specialization and diluted skills are a concern.
  • Conversely, there are legitimate fears that many data science positions will be lost to automation in the coming decade
  • Data science isn’t for everyone. However, more data scientists with backgrounds in the humanities or social and behavioral sciences would be a positive development in my opinion.
  • Analysis of high velocity, streaming data – “real-time analytics” – remains the exception in data science, not the rule
  • The notion that “big” data necessarily has more business meaning than “small” data is misguided; most data have little or no business value
  • Descriptive information and pattern recognition are not insights; insights cannot be automated and result from humans interpreting data and patterns in data. Interpretation is contextual and not possible without at least some subject matter expertise.
  • IoT will have some impact on data science – it already has – but in the near term this will mostly be confined to manufacturing, maintenance and operations
  • If they are politically inconvenient, even the biggest data and most sophisticated analytics will be ignored or merely become additional arrows in the quivers of the politically skillful. Few companies have a management culture that truly understands how data and analytics can be used to enhance decision-making. Often, they are still seen as an irritation or threat.
  • Marketing researchers lose their primary research skills at their peril; seldom will all, or even most, of the answers decision makers need be found in existing data sources.
  • Causal analysis is the next frontier in analytics
  • Is data science really The Sexy Job? In many ways, I believe it is.
Browse

Article by channel:

Read more articles tagged: Analytics, Data Visualisation, Featured

Data & Analytics