Data Scientists Data Science

Data Science: Spread Too Thin?

An unambiguous and universally-accepted definition of “data science” remains elusive but, generally speaking, it refers to some blend of computer science and statistics.

“Data scientist” is even harder to define and can include researchers at government institutes, researchers at private companies and people I simply called “scientists” when I was growing up. The largest group I come into contact with are in the private sector and their work has some connection with marketing, though often this is a distant one. Within the section of data science I am most familiar with, there are two main groups of data scientists.

The first, regardless of their academic background, places a heavy emphasis on programming. Let’s call them the “programmers.” The programming they do pertains to data management, data processing, analytics and software development, and many programing languages are usually listed on their profiles.

There are often heated discussions within this group which, at times, resemble sectarian disputes. A few years ago, Hadoop versus Spark was a major theme; recently, it’s “R sucks. Python rules!” versus “Python sucks. R rules!”

There is a second segment the first are united against, and these might be called the “point-and-clickers” or “click-and-draggers,” depending on their preference for software. This group is frequently disparaged by the first. Most know little about programming and lean on user-friendly software to get the job done. Some of this software is free with limitations and, at the other extreme, some is very comprehensive and very expensive. There are products in the middle, too. Regardless, the software is usually easy to operate but not necessarily easy to operate intelligently.

D

I’m in closer touch with the first group than the second, and many of my contacts seem to feel the second are not real data scientists and will go the way of the dinosaurs. The asteroid in this case is the cost of their software.

As I noted, however, not all this software is prohibitively expensive. Though I am less likely to come into contact with them, my sense is that there are still plenty of these dinosaurs around. I think there is a sampling issue here. Many CFOs see these user-friendly tools as productively enhancers and would rather spend more on software and less on people, especially if they perceive the people as geeky prima donnas with little understanding of business.

There are statisticians in both groups who now seem to be making their presence felt but they are greatly outnumbered. There just aren’t that many statisticians on the planet, especially experienced ones.

A third group I haven’t mentioned are software developers working mostly or exclusively on AI. There are many of them – pretenders included – and this group appears to be growing. Most seem highly specialized and focused on certain areas within AI, such as image analysis. A minority have a strong background in statistics as far as I can tell. Some are geniuses by any reasonable definition.

There other groups and other trends I haven’t touched upon, including those I’m unaware of. Besides definitional and echo chamber issues, a serious concern I have is that data scientists are required to know a little bit about a great number of things. Often what they learn is wrong or too superficial to be useful. I find it hard to believe that someone who is 28 years old is competent at programming 10 languages, has a solid grasp of statistics and understands business and other touchy human matters.

Many data scientists, including bloggers I’ve never had direct contact with, appear to know little about statistics and even less about primary research. I’ve seen discussions on social media in which someone asks a question about statistics and everyone gives a wrong answer. The question itself may be nonsensical and suggests poor understanding of statistics and research. This will all affect the bottom line – if it doesn’t, then there is no point in data science in the first place.

I know, I know. AI and machine learning will come to the rescue. Except that they won’t.

Arrange a Conversation 

Browse

Article by channel:

Read more articles tagged: Analytics, Featured, Marketing Analytics

Data & Analytics