Records vs. Signals: The Landscape of Digital

    Published on

Geoffrey Moore

Follow Following Unfollow Geoffrey Moore

Sign in to follow this author

Author, Speaker, Advisor

Everyone gets that data is the new oil in the digital economy, but not everyone gets that there is a critical difference between data as records­­-data in databases-and data as signals-data from log files, sensors, social media posts, and the like. Let me explain.

Data as records represent verified facts that express the essence of the activity they record, be that in the form of tables, text, graphs, or images. They are the foundation of Systems of Record upon which rest the integrity of the digital economy and the digital society. Such data feed programmatic decisions that are deterministic, meaning they follow an explicit and transparent decision tree leading to one and only correct answer.

Data as signals, by contrast, represent unverified facts that testify to the occurrence of the activity they record, be that a phone call, tweet, temperature reading, or website click. They are foundational to Systems of Engagement as well as to the Internet of Things, upon which rest the productivity of the digital economy and the digital society. Such data feed algorithmic decisions that are probabilistic, meaning they are based on the intensity of the signal as a proxy for the present likelihood of a given situation or the future likelihood of success for a given response.

Anyone who has learned to code is familiar with data as records, but unless you have taken courses in creating machine learning algorithms, you probably are not familiar with data as signals. I am no data scientist either, but I have been learning about these systems from readings in biology and complex systems (most recently Complexity: A Guided Tour, by Melanie Mitchell). It turns out that the immune system, for example, is basically a signals-based machine learning algorithm for detecting and dealing with antigens. Ant colonies operate as signals-based machine learning algorithms for dealing with food discovery and task distribution. And our very metabolism is a signals-based machine learning system for building the right proteins at the right times for each of our cells to function properly.

What all these systems have in common is that data is acquired by sampling, and actions are triggered by concentrations of signal that exceed whatever threshold for activation exists. Thus, for example, ants that find food excrete pheromones on their way back to the colony, and these chemicals signal to other ants the path to follow to get more food. The more ants, the more pheromones, the stronger the signal becomes, until all the food has been harvested-at which point, with few ants and weaker signals, the next troop are led elsewhere.

Digital marketers take the same approach to website clicks when they are looking where to place digital ads. Algorithmic traders take the same approach to high-frequency trading. Security software takes the same approach to cyber-attacks. Predictive maintenance takes the same approach to sensor readings. It is all about letting signals that are concentrated in location and time operate as proxies for a given state that implies a given response.

One key to such systems is that they do not understand. This is the big difference between AI and Machine Learning. AI does understand, or tries to. Machine learning doesn’t. It just operates, and it lets the feedback of natural selection guide its development. The more shots on goal, the better natural selection works, which is why machine learning algorithms all hunger for more data as signals.

As our digital economy evolves, one can see that AI and Machine Learning will interoperate more and more, much the way our conscious cerebral cortex interoperates with our cerebellum and autonomic nervous system. That is, we will begin to take AI stabs at understanding why Machine Learning algorithms are succeeding, and the more we are able to understand, the better our future strategic decision-making will be. Right now, machine learning is in the lead, at least here is Silicon Valley, but one can imagine a future in which each discipline takes turns pulling the other forward.

That’s what I think. What do you think?

Finally, when it comes to setting public policy about data, or when it comes to estimating the economic value of data, it is critical that we distinguish between data as records and data as signals. The former are individually valuable and normally proprietary, so they need to be secured, and they warrant the protection of law. By contrast, data as signals are only collectively valuable and are normally not proprietary, so they warrant a different treatment. There is still some level of privacy risk to account for, but there is virtually no economic value at the level of the individual occurrence, and as a result it is important not to impose a data-as-records regulatory regime onto a data-as-signals enterprise.

All in all, referring to signals as data just creates confusion, so I am hoping going forward we can use the data versus signals distinction to keep our thoughts straight and our policies reasonable.


Geoffrey Moore | Zone to Win | Geoffrey Moore Twitter | Geoffrey Moore YouTube


Article by channel:

Read more articles tagged: Digital Transformation