Riding the waves of Big Data

The data management waves over the past fifty years have culminated in where we are today, “The Era of Big Data”. To really understand what big data is all about you have to understand how we have moved through the waves from one to another and that we have never thrown anything away as we have moved forward, in terms of tools, technology and practices to address different types of problems.

The First “Big” Wave – Manageable Data Structures

Computing moved into the commercial market in the late 1960s, data was stored in flat files that imposed no structure. When companies needed to get to a level of detailed understanding about customers, they had to apply brute-force methods, including very detailed programming models to create some value. Later in the 1970s, things changed with the invention of the relational data model and the relational database management system (RDBMS) that imposed structure and a method for improving performance. More importantly, the relational model added a level of abstraction (the structured query language [SQL], report generators, and data management tools) so that it was easier for programmers to satisfy the growing business demands to extract value from data.

The relational model offered an ecosystem of tools from a large number of emerging software companies. It filled a growing need to help companies better organise their data and be able to compare transactions from one geography to another. In addition, it helped business managers who wanted to be able to examine information such as inventory and compare it to customer order information for decision-making purposes. However a problem emerged from this exploding demand for answers: Storing this growing volume of data was expensive and accessing it was slow. To make matters worse, huge amounts of data duplication existed, and the actual business value of that data was hard to measure.

At this stage, an urgent need existed to find a new set of technologies to support the relational model. The Entity-Relationship (ER) model emerged, which added additional abstraction to increase the usability of the data. In this model, each item was defined independently of its use. Therefore, developers could create new relationships between data sources without complex programming. It was a huge advance at the time, and it enabled developers to push the boundaries of the technology and create more complex models requiring complex techniques for joining entities together. The market for relational databases exploded and remains vibrant today. It is especially important for transactional data management of highly structured data. When the volume of data that organisations needed to manage grew out of control, the data warehouse provided a solution. The data warehouse enabled the IT organisation to select a subset of the data being stored so that it would be easier for the business to try to gain insights. The data warehouse was intended to help companies deal with increasingly large amounts of structured data that they needed to be able to analyse by reducing the volume of the data to something smaller and more focused on a particular area of the business. It filled the need to separate operational decision support processing and decision support, for performance reasons. In addition, warehouses often store data from prior years for understanding organisational performance, identifying trends, and helping to expose patterns of behaviour. It also provided an integrated source of information from across various data sources that could be used for analysis. Data warehouses were commercialised in the 1990s, and today, both content management systems and data warehouses are able to take advantage of improvements in scalability of hardware, virtualisation technologies, and the ability to create integrated hardware and software systems, also known as appliances.

Sometimes these data warehouses themselves were too complex and large and didn’t offer the speed and agility that the business required. The answer was a further refinement of the data being managed through data marts. These data marts were focused on specific business issues and were much more streamlined and supported the business need for speedy queries than the more massive data warehouses. Like any wave of data management, the data warehouse has evolved to support emerging technologies such as integrated systems and data appliances. Data warehouses and data marts solved many problems for companies needing a consistent way to manage massive transactional data (SAP). Unfortunately, when it came to managing huge volumes of unstructured or semi-structured data, the warehouse was not able to evolve enough to meet changing demands. To complicate matters, data warehouses are typically fed in batch intervals, usually weekly or daily. This is fine for planning, financial reporting, and traditional marketing campaigns, but is too slow for increasingly real-time business and consumer environments. How would companies be able to transform their traditional data management approaches to handle the expanding volume of unstructured data elements? The solution did not emerge overnight. As companies began to store unstructured data, vendors began to add capabilities such as BLOBs (binary large objects). In essence, an unstructured data element would be stored in a relational database as one contiguous chunk of data. This object could be labeled (that is, a customer inquiry) but you couldn’t see what was inside that object. Clearly, this wasn’t going to solve changing customer or business needs.

Enter the object database management system (ODBMS). The object database stored the BLOB as an addressable set of pieces so that we could see what was in there. Unlike the BLOB, which was an independent unit appended to a traditional relational database, the object database provided a unified approach for dealing with unstructured data. Object databases include a programming language and a structure for the data elements so that it is easier to manipulate various data objects without programming and complex joins. The object databases introduced a new level of innovation that helped lead to the second wave of data management.

The Second “Big” Wave – Web, Unstructured and Content Management

It is no secret that most data available in the world today is unstructured. Paradoxically, companies have focused their investments in the systems with structured data that were most closely associated with revenue: line of business transactional systems. Enterprise Content Management systems (example OpenText ) evolved in the 1980s to provide businesses with the capability to better manage unstructured data, mostly documents. In the 1990s with the rise of the web, organisations wanted to move beyond documents and store and manage web content, images, audio, and video. The market evolved from a set of disconnected solutions to a more unified model that brought together these elements into a platform that incorporated business process management, version control, information recognition, text management, and collaboration. This new generation of systems added metadata (information about the organisation and characteristics of the stored information). These solutions remain incredibly important for companies needing to manage all this data in a logical manner. However,  at the same time, a new generation of requirements has begun to emerge that drive us to the next wave. These new requirements have been driven, in large part, by a convergence of factors including the web, virtualisation, and cloud computing. In this new wave, organisations are beginning to understand that they need to manage a new generation of data sources with an unprecedented amount and variety of data that needs to be processed at an unheard-of speed (SAP S/4 HANA).

The Third “Big” Wave – Managing “BIG” Data

Big data is not really new,  is it an evolution in the data management journey!  As with other waves in data management, big data is built on top of the evolution of data management practices over the past five decades. What is new is that for the first time, the cost of computing cycles and storage has reached a tipping point. Why is this important? Only a few years ago, organisations typically would compromise by storing snapshots or subsets of important information because the cost of storage and processing limitations prohibited them from storing everything they wanted to analyse.

In many situations, this compromise worked fine. For example, a manufacturing company might have collected machine data every two minutes to determine the health of systems. However, there could be situations where the snapshot would not contain information about a new type of defect and that might go unnoticed for months.

With big data, it is now possible to virtualise data so that it can be stored efficiently and, utilising cloud-based storage, more cost-effectively as well.  In addition, improvements in network speed and reliability have removed other physical limitations of being able to manage massive amounts of data at an acceptable pace. Add to this the impact of changes in the price and sophistication of computer memory. With all these technology transitions, it is now possible to imagine ways that companies can leverage data that would have been inconceivable only five years ago. 

Is there a Fourth “Big” Wave ? – Evolution, IoT

Currently we are still at an early stage of leveraging huge volumes of data to gain a 360-degree view of the business and anticipate shifts and changes in customer expectations. The technologies required to get the answers the business needs are still isolated from each other. To get to the desired end state, the technologies from all three waves will have to come together. Big data is not simply about one tool or one technology. It is about how all these technologies come together to give the right insights, at the right time, based on the right data whether it is generated by people, machines, or the web.

Arrange a Conversation 

Browse

Article by channel:

Read more articles tagged: Big Data, Featured