Making contact with your gravity defying data

Can you hear me major Tom?

While there is little certainty in debates whether life exists on Mars or indeed even after Bowie. There is certainty about the existence of space debris and now how our modern digital driven world has littered our planet with data debris.

Some businesses: the IBMs, Hadoops and Clouderas of this world have made leaps and bounds into this astronomical junkyard to offer the rest of us mere mortals data management solutions and visualisation tools to handle our big data problems and to locate hidden data gems of value available amongst the noise.

For small businesses (SMEs) big data providers offer solutions to handling data and visualisation models that are Pavlov[1] salivatingly attractive but often alas can be financially or technically out of reach in-house. As an alternative what remains is the headache of having to secure databases which are increasingly expensive and difficult to manage due to data protection laws (notably the recent infeaturedvalidation of the Safe harbour provision for protection of data in transit from one jurisdiction to another).

In addition to this new legal expense are the demands of meeting technical advances required to protect our databases from data breach. This might include either managing the security of data sets in databases utilising the latest new cyber and encryption tools which even then may still remain vulnerable to data breach as a result of hacking by ruthless criminals, competitors or other actors. On top of this technical advances to manage and make sense of our data may be centred around being able to fund the bill for the latest data analytics software as a service (SaaS).

Compounding all this, the usual costly threats continue to exist: the teenage student bored in their bedroom looking for a quick thrill of extracting confidential information from your business database and sharing it on social media sites to show off to their tech savvy to mates. The revenge seeking ex or current employee or the opportunist criminal gang seeking to subvert any employee oversight or slow update to a vulnerable web payment system.

The media targets for data breach

There is a great deal of talk about what could have been done better to protect data in the media in light of the latest high profile companies falling victim to data breach. This was seen in Ashley Madison where personal data was stolen from a dating company by a criminal gang and utilised for blackmailing purposes. The victim company were considered to have created the risk for themselves by not deleting obsolete personal and sensitive data including payment card details from their website.

The nature of TalkTalk’s data breach differed to Ashley Madison. Here an SQL database password was not adequately encrypted giving attackers access to the company’s customer personal and sensitive data. The data breach was discovered when customers started to be spammed as a result of their data being posted online.

As at the 6 November 2015 and reported by the Register[2] TalkTalk’s data breach was said to have affected 157,000 of its customers. The breach was costly for TalkTalk not only in their having to compensate customers but in terms of their stock value taking a significant nosedive.

Most recently 2016 continues to see high profile cyber attacks leading to data breach of customer information. On 21 January 2016 Asda were reported to have serious vulnerabilities on their website which allowed a hacker to log into to their database without a password and to extract customer payment data through what has been referred to as a Cross site scripting and CSRF/XSRF (Cross Site Request Forgery) based attack. This approach to illegally obtaining customer data the Register reported is more complicated to achieve than the TalkTalk’s attack but nonetheless has demonstrated the same effect.[3]

To compound these nightmarish real world scenarios business owners are left to manage their Asymmetric encryption keys to their databases with increasingly complex passwords, European legal provisos and authentication procedures which if they are a little honest are frustratingly opaque to understand and increasingly requiring technical skills not available in-house. So where to begin to shed sense on this noisy cyber junkyard and find workable rather than other worldly solutions initially can seem like a gravity defying task.

Now one month into 2016 shows the data breach theme is not going to lose its appeal to the news reading general public especially if staggering figures are to be believed such as 382,000 employees affected in the Austrian Airbus Boeing case and a sum of money stolen amounting to 50 million Euros admitted by the financial accounting wing of the business as an outflow of cash as a result of “cyber fraud. “[4] An act believed to have been committed by an employee who tampered with in house payment systems to extract the sums involved but that is still subject to further forensic investigation.

Defining data and data sets as distinct from databases

One starting point to making sense of the data dog’s breakfast found in cyber space is to understand what exactly is meant by data and data set and why these categories even have value to begin with and are worth protecting by businesses through technical and legal controls.

What is meant by data?

Data can be understood as a range of different things: from knowledge, a record of observation of stars or of business stock takings, or minutes or profit and loss reports. Information, code, statistics, and facts are all ways in which data might be apprehended, encrypted or legally protected.

Definition of data sets

Data sets by contrast to data can be apprehended in two technical ways. Firstly data sets can be technically Structured for example where they take the form of tables or other arranged format. SQL is the most famous name for Structured data sets held in databases.

By contrast to data sets which are Structured are Unstructured data sets. A famous type of Unstructured data in a database is NoSQL. Here is data which is not organised into tables or models or statistics in a neat and arranged manner. Unstructured data is often referred to as dirty and requiring a hip new and worryingly expensive breed of data scientists or Data wranglers to manage because it is data that has not been put into tables or models or charts and so requires further technical action carried out on it in order to represent its meaning more clearly, or neatly, or in a way not possible because of the scale of the information being carved up. Such data is however ripe for data scientists to extract and cut up, parse and manipulate through such bespoke tools and through approved acts such as data or text mining to repackage that data into a more digestible format.

Unstructured data is now often being subject to bespoke new data and text mining tools. Some of these data set extraction activities can be big data focused and are offered by companies such as Cloudera, Hadoop, Hortonworks, MapReduce, Tableau.

This range of big data handling companies offer varying solutions to businesses that can facilitate an improved ability for the customer to extract, parse or analyse their large data sets at speed and to be able to represent that data in new tabular or other spectacular visualisation formats to gain insight.

Extracting Unstructured data from databases is proving to be lucrative to businesses for example where an individual or business wants to find correlations in their own data that can help lead to the discovery of new cost efficiencies about their own customers or employees. One example of unstructured data use has been to create a visualisation of an archive of email conversations in order to represent pertinent relationships.

What all this has to do with data breach is that now more than ever data is a commodity that can be more easily and quickly manipulated, extracted and reutilised for financial gain.

Realising the value in your data

One way therefore to start to apprehend your data is to take an audit trail of it. What data do you have which is classifiable as ‘personal’ and ‘sensitive’. Who are your data controllers and when did they start and finish their term of office. What data do you have which is classifiable as intellectual property such as copyright. An audit trail of the types of data your business has will allow you to take stock of its value and what the cost of losing that data would be against the value of investing in measures to protect it. The Single Loss Expectancy (SLE)model is a helpful model for achieving this goal allowing for calculation of the cost of a single expected event against the cost of investing against that event.

Business data like space debris is everywhere and seemingly out of control globally. Taking stock of what data belongs to your business and what data sets bring back control and value to your organisation. A data audit is time prudently spent and need not be as difficult to tackle as putting a man on the moon.

In the future the UK will see more data breaches as data becomes widely recognised as a currency and a real asset to exchange and trade nationally and across international jurisdictions.

Realising the value of what data you have now through an audit trail, implementing ICO 27001 controls and understanding how that data can be manipulated and best positioned is critical to future business success. It is also a cost effective strategy in lieu of amended European laws and UK policies[5] being ushered in to protect data, privacy and intellectual property. Forward thinking businesses apprehending these steps into the future direction of data security are already applying their musings through practical tools to minimise future potential risks and legal costs.

Useful websites


[1] Ivan Pavlov born 1849 was a famous Russian physiologist who studied the digestion of dogs by looking at their external digestive pouches and who noticed that they salivated before not after food arrived. Pavlov’s most famous research was for the notion of the conditioned reflex a term used to describe how when a bell was run or fork tuned dogs could be made to salivate through a process of repetitive conditioning and association of the noise with the assumption feeding was imminent.




[5] Hargreaves review of Intellectual Property and Growth 2013

Arrange a Conversation 


Article by channel:

Read more articles tagged: Featured