Data centre design in a multi cloud world

Data centre design in a multi cloud world

IT Transformation projects are usually driven by the need to reduce complexity, improve agility, simplify systems, contain costs, manage ever-growing data and provide more efficient operational management.

Arguably, for seasoned IT professionals, there is nothing new about the drivers for transformational change; it’s the velocity and scale of transformation today that’s the big challenge.

IT functions have always been constrained by the fact that the majority of their costs are incurred on the day to day operation of their business which is a limiting factor when it comes to finding funding for more innovative activities.

The innovation issue has been further exacerbated by the fact that traditionally when the business has requested new services or solutions, the IT department invariably begins with the question of how do we create this within our existing architecture.  IT often has, and still continues in many cases, to see themselves as the governance component of solution design and delivery.

How many times have we heard the phrase “we work with the business to help them define their technical requirements to business problems”?  In most cases, this is seen as a statement of positive intent, with IT creating a world where they arbitrate business requirement against infrastructure constraint and manage any compromise required between the two.  This management of compromise is achieved with greater or lesser degrees of success and is often the factor that defines whether IT is seen as an enabler or a blocker to the business.

The fundamental problem though is that, however efficiently IT arbitrates these often competing requirements, it becomes a constraint.  There is a long overdue need for IT to make a fundamental shift from infrastructure-based decisions to business value decisions and for IT leaders to build their strategy on the application portfolio rather than where it is located.

Some organisations have seen the cloud as a solution to this problem and have adopted an approach where new initiatives are considered in terms of the available platforms and so infrastructure plans are made on business need at the application or workload level and not just on the physical infrastructure.

But this is still constraining as, although the various cloud offerings provide more flexible infrastructure options, few IT organisations are in favour of too many cloud platforms as they see this as an ungoverned free for all.  Thus, although the cloud creates more choices, the need for IT to feel capable of governing the estate invariably leads to a palette of limited choice.

There are other significant factors at play here as well.  The world of technology is changing and it is bursting out from the traditional ‘everything accessed from the centre’ approach of Data Centres to a highly distributed range of devices and systems.

The Internet of Things, Edge Computing and the need to deliver exceptional customer experience with outwardly facing applications are all creating waves of disruption through the traditional closely managed and governed and centralised Data Centre that has been the heartbeat of many organisations for many years.

There are distinct and rational decisions being made by organisations to re-consider where they locate their technology.  These decisions may be based on technology constraints such as network latency, the need to place some core technology close to clusters of customers or even compliance-driven requirements to maintain data and systems in certain geographies to comply with legal constraints such as the GDPR.

As organisations, we are also suffering to a degree from the legacy of the first wave of as a Service adoption.  There was, and still is, an established principle in many organisations to try and release technical debt through the migration of legacy environments to a variety of as a Service solutions.  Many of these migrations have seen a shift to Software as a Service but this has not always created the expected efficiencies.

It has shown a willingness of IT functions to relinquish control of the infrastructure underlying applications and to accept that the business will consume a service managed and delivered by a third party.  It is a positive first step towards seeing the world as driven by business need and unconstrained by infrastructure but the downside has been that organisations have ended up with information silos with data trapped within the third party solutions.

Having a homogenous infrastructure historically allowed organisations to move data almost seamlessly around their estate and to create multi-dimensional views of, for example, their customers.

So we have arrived at a place where the pressure is building to enable a more flexible approach to deploying solutions across an organisation but where we also have a level of legacy sprawl as a consequence of some early adoption decisions that were made in good faith at the time.  Add into the mix the increasing choice we have in different platforms that we have available to us and we are arriving at a point where some longer-term strategic decisions need to be made with regard to how we design our Data Centres going forward.

In essence, we are now having to make decisions that will create our new Digital Infrastructure.  This new approach means taking a more holistic view that allows organisations to step away from the traditional notion of a Data Centre but we also need to ensure that we have some other elements clearly set in our minds as we start to look differently at our infrastructure landscape.

Organisations that have not already done so need to create a full and structured overview of which workloads belong where and this needs to encompass all the workloads.  ‘Cloud first’ or ‘all in cloud’ are not useful statements, despite often being touted as ‘a strategy’.  What is needed is proper granularity and clear and unambiguous logic applied to all of the workloads.  It is possible to start with the ideal world and then rationalise this.

This approach may be to apply a default rule that says ‘everything will go here unless ….’.  How the roadmap is achieved is less important than having the complete map clear because this is the foundation for the next two critical stages.  These two stages ideally need to be executed before any workloads are moved as they form the basis of the new Digital Infrastructure design.

Organisations will need an eco-system of partners.  Having spent the previous few decades building, managing and maintaining in house operated Data Centres organisations are likely to require assistance and this is likely to be sourced from a variety of partners.

The ecosystem is likely to include public cloud providers, network providers, co-location and interconnectivity providers but it should also include partners who can assist with ensuring organisations know exactly where all their workloads and data now resides.

For some, creating this eco-system will be a whole new venture and for some, it will be examining the existing eco-system and testing to ensure it is fit for purpose in the world of the Digital Infrastructure.  There is a great danger in simply continuing with the existing eco-system on the assumption that relationships that have served well in the past will continue to do so.

There is also the need to find a way to manage the new environment which could be spread across multiple locations and geographies on many platforms.  Effective monitoring has been on the IT Operations agenda for many years and has been achieved with varying levels of success.  Monitoring now needs to shift exponentially.  Investments made in infrastructure monitoring, which is the one area many have succeeded, is now only a small part of what is needed.

Much of the underlying core infrastructure that will be used in a Digital Infrastructure is no longer the responsibility of the internal IT function but is managed by a third party under a Service Level Agreement.  Internal IT need to be aware of issues but, more importantly, they need to be aware of the impact of issues.

This need to understand the impact means organisations need business relevant monitoring.  They need to monitor the performance of functions and services and to clearly understand the impact of suboptimal performance of these systems to users and end customers.  In short, they now need performance-based monitoring and they need it to operate effectively across multiple infrastructure venues.

Although this type of monitoring is possibly one of the most complex projects IT operations teams will undertake, let’s assume for a moment that it is complete.  IT teams now need to ask what they do when there is an issue?

They may face a situation where identifying the owner of the root cause is not straightforward and with increasing criticality of the systems, there is little time for debate and discussion between different stakeholders.  What is required is deep insight and it is needed quickly and it must be underpinned with effective and efficient co-ordinated responses.

These responses require highly skilled people to respond.  There is much being written at present on the skills shortage that is currently being suffered within IT departments.  One area that is less prevalent in these discussions is infrastructure as there seems to be an underlying assumption that infrastructure is infrastructure wherever it is located.

This is a misnomer because the Digital Infrastructure needs as much feeding and watering as a traditional Data Centre but it needs a different set of skills built upon the strong traditional understanding that exists within the current IT Operations team.

What is needed is an IT Operations model and skills set that transcends the multiple boundaries that exist in a hybrid model.  Hybrid operating models are often regarded as a temporary state as organisations seek to shift to a new operating paradigm but it is likely that hybrid or a Distributed Digital Infrastructure is going to be a normal operating state for the next few years at least.

This model means that organisations well place their workloads on the most appropriate platform for the business which is most likely across multiple providers, multiple platform types and multiple locations.

Data Centre Design in a Multi Cloud World is not about the design of the Data Centre.  The location of the workloads will be determined by business-centric requirements.  It is likely that most organisations will retain some element of their own environment within our their management either in an on-premise Data Centre or in a co-location facility and equally likely they will have both SaaS and IaaS solutions with third-party providers.

Data Centre design is, therefore, all about the management of the new Digital Infrastructure.  Over time many organisations have become great at managing the various elements of their infrastructure.  They have great tooling and processes for compute, networks, storage and virtualisation but therein lies the problem.  Few organisations have tooling and processes that span these elements.  Management is undertaken in silos but the new Data Centre does not function in silos.

The new Data Centre also evolves and develops at a faster rate than previously and in a way that is not always within the direct control of the IT operational team.  Patching, application upgrades, functional changes and code releases can often be made by the vendor with minimal or no interaction with the customer and organisations need to be able to deal with this.

Internal development teams will release more code at a faster rate than previously and will be introducing constant change into the environment while the business is likely to be acquiring their own solutions at least some of which IT Operations will only find out about when it breaks.

Governance is often used as the big stick to address the inevitable issues that arise from these multiple sources of change but we must be careful about how we use governance.  As Digital Infrastructure emerges, so internal IT functions must develop new approaches to governance, management, monitoring and response to meet the emerging needs.

Digital Transformation Consultation

Governance is a wide-ranging topic but many of the areas will need to change to reflect the Digital Infrastructure.  One of the key changes is that governance becomes a shared activity with responsibility being now split between the IT function and the Cloud Provider and for that to work there needs to be a very clear demarcation of responsibility and also very clearly defined communication mechanisms between the various parties.  We have all been in a position when having experienced a failure that no party will effectively own the resolution of the issue and so the failure is compounded by the failure of the incident response.

To mitigate the potential for these issues to arise, it is worth exploring a number of key areas in detail to ensure that the risks of hybrid operations are identified and appropriately addressed.  As a minimum, every organisation needs to be looking at over and above the interaction between the internal IT function and the service provider or, more likely, the providers.

Here is a list of risks to consider as you move into a hybrid model:

Audit and compliance risks: With these new models we need to consider data jurisdiction, data access control, and maintaining an audit trail.  We need the same level of verifiability across the whole digital estate and that can become a highly complex situation to manage.  We also need to manage this process with complete transparency between ourselves and our eco-system of partners.

Security risks: Security risks have always been front of mind when cloud is discussed and the majority of cloud solutions these days can be viewed as being secure in the traditional sense of preventing unauthorised access to systems but we need to ensure we also carefully manage data integrity and data confidentiality and privacy across the platforms.  For example, receiving a request for deletion under GDPR needs to be something that is relatively straightforward to enact across multiple systems, platforms and providers but must also be comprehensive and effective.

Other information risks: Whereas compliance and security risks are reasonably well defined and bounded with a number of regulations which provide a useful framework for us we must also be cognisant of other information risks such as are we protecting our intellectual property effectively?  Do we know where it is stored and to what jurisdictional rules it may be subject?  Many organisations often adopt the simple approach of keeping the crown jewels in their own Data Centre.  There is nothing wrong with this cautious approach but it shouldn’t preclude using the digital infrastructure if that generates business benefit but we do need to have strong, well-managed processes in place to address this.

Performance and availability risks: When we have total control of the infrastructure in a traditional Data Centre, we have a single view of the systems we manage.  That enables us to rapidly respond to incidents and to maintain a view of the performance and the availability of systems.  With our Digital Infrastructure, we need to carefully examine the level of availability and performance our business requires to successfully operate.  For example, alerts, notifications, and provider business continuity plans. We need to define these carefully as there are significant commercial implications in many cases whereas in our traditional world it is simpler and often less cost sensitive to make everything bulletproof.

This may sound obvious, but many Digital Infrastructures are created without due consideration to the true and relative sensitivity to the business of levels of availability.  We need to create designs and architectures that align with the actual sensitivity of the systems but that also means mitigating infrastructure failures of third party providers with careful design.

There is little point in signing a contract against a solution that promises 100% uptime and pays service credits if the system in question is your business critical system.  No Board is going to accept service credits in lieu of half a day of lost trading but, likewise, questions will be asked if the costs of a system that is not business critical suddenly increase because the ‘make it all bulletproof’ approach has been adopted.

The other question that is often overlooked when assessing risks of this nature is whether your partner has a proper and efficient forensic analysis capability available for when something goes wrong?

Undertaking a full Root Cause Analysis can be tricky at the best of times but trying across multiple environments exacerbates the issue so ensuring partners have this as a core capability is essential.

Interoperability risks: There is the inherent risk associated with developing a service that might be composed of multiple services that these systems must be able to interoperate effectively.  Establishing this at the outset is important but the one area that sometimes is overlooked is ensuring this ongoing.  What happens if one element of one service changes because the vendor determines the change is necessary?  Will the systems still inter-operate effectively?

Do we have sufficient systems in place to understand a change is being made and the impact of the change and are suitable systems in place to allow us to test these changes?  It’s not an option to accept that elements of our environment can be changed without notice and without consultation but a surprising number of vendors will take this approach as it is their key to being able to drive their own operational costs down.

These risks are also exacerbated when we choose to customise systems provided by third parties as we then need custom updates and this can become expensive.  It becomes more important as we build our Digital Infrastructures to adopt a commodity approach.  We need to examine the business and the possible technology solutions and be prepared, for some standard systems, to adapt our business processes to the mainstream software rather than insisting on customising the software to our own quirks.

That may sound counter-intuitive in this world of rapid development and custom code but it is far easier to manage a standard system running against a standard process than to customise a third party solution simply because our business tells us it won’t adapt their processes.

Contract risks: It is inevitable that as we build our Digital Infrastructures that we will end up with multiple contracts for the various elements and it is critical that we learn how to read these contracts effectively.  Procurement services often take the lead in contract negotiations but we in IT need to step up and make an effective contribution to the assessment and recommendations on contract structures, especially where there are multiple contracts that need to mesh as effectively as the systems themselves.

We need to be satisfied that what is often a patchwork of contracts all line up neatly with no gaps in the different services and, equally importantly, no overlaps that could cause confusion.  Constructing detailed RACI matrices is a highly effective approach as it is a clear visual mechanism for spotting gaps and overlaps.

We also need a clear understanding of the costs of exiting from contracts.  The fast pace of change that we are all facing regularly means we may need to reconstruct our working models from time to time and locking ourselves int to rigid long term contracts is often the wrong approach.  Longer term contracts may drive cost benefits but they impact the value of the environment if they also act as a brake on our ability to execute change within our Digital Infrastructure when needed.

This ability to exit should also be viewed from a technical perspective.  Is the data in a form that is transportable and will we be provided with that data in a usable form?  It is actually worth ensuring that data ownership is very clear as well as some contracts are less than specific on such matters.

Finally, we have Billing risks: It may sound obvious but we need to make sure we understand how we are billed and be able to quickly and easily verify each bill we get and, perhaps more importantly, spot trends that allow us to shut down systems that have the potential to cause abnormally high billing before we incur the costs.

As we are seeing, the incremental risks that we have to deal with on top of all the risk assessments we have to make for normal infrastructure functions are significant and many of them are in areas that we haven’t had to deal traditionally.  This is where we in IT need to consider other skills and where we can find them.  To be able to begin the search for help, it is worth considering how effective governance will work in a Digital Infrastructure.

Essentially, effective governance will be based on people, process and technology in that order.  Effective management of a Digital Infrastructure is essentially a three-part solution.

Part 1 – we need to establish a cross-functional governance body.  It can be seen from the high-level risks posited earlier that we need to engage different aspects of the business to gain a comprehensive view of governance and to establish effective processes to enable enforcement of the governance.  This Governance Board will have full responsibility for the oversight and will be the connector to the various elements of the business.

Many of these boards are established within IT and struggle because they remain inward looking to the IT function.  Just as we now need to ensure that our Digital Infrastructure is validated as optimal by the business, we must adopt a similar approach with the Governance Board.

The CIO should own the Governance Board and there will be technical representatives but it must also include end users, procurement and/or legal, IT Service Management, Finance and other operational staff to ensure that a fully rounded view of all risks and how they are mitigated is understood at a macro level.  The Board will also develop the best practices that will be adopted throughout the Digital Infrastructure.

Some of the most effective Governance Boards have the equivalent of non-Exec Directors.  They include some of their strategic partners in the process and also include completely independent advisors who are able to perform the role of a critical friend, challenging assumptions and testing hypotheses.  The construct of this Governance Board is critical and the components often not obvious.

Part 2 of the effective management is to establish smaller but effective groups that deal specifically with standards in the Digital Infrastructure and also deal with shared infrastructure issues.  These groups will set and enforce a minimum level of standards and work to reduce risk in inter-operability particularly.

Part 3 sounds simple but is often the most challenging to actually get right.  Technology that can provision standard environments quickly and efficiently and then monitor and manage across the Digital Platform in a meaningful and effective way is absolutely essential.  A single pane of glass is a phrase that has existed for many years and seems to be the holy grail of IT operations.

There are as many monitoring and management tools as there are platforms and solutions and picking the one or two elements that we need from this mass of options is, in itself, a huge task.  Designing, implementing, configuring and operating this nirvana solution is even more challenging.

Organisations that have separated the monitoring and management function creation and optimisation away from the actual IT operations have been hugely more successful in implementing effective solutions.

This separation can take one of several forms: some organisations simply outsource the whole solution to a third party that has invariably invested huge amounts of time and money already in the core systems and processes and can simply deliver them as a service.

Some organisations opt to build out their own monitoring and management platforms for the Digital Infrastructure but keep it within a distinct and separate group so the effort in creating and maintaining the environment is not diluted with day to day operational issues.

Whichever approach is taken, it is clear that the monitoring and management of a Digital Platform is rarely a simple iteration of an existing solution. It is usually a radical transformation and requires planning and significant time, effort and financial investment.  In the early days of a Digital Infrastructure, this is likely to be the highest initial cost.

So, it is clear that the Data Centre of the Digital Platform is really more about the elements on the outside.  The actual servers, networks, storage and similar elements will be in multiple places and operated by different people, including some by the internal IT function in the traditional way.  That is not that difficult to accept and understand but managing it effectively means we need to think differently.

If we look back at the reasons we have for creating a Digital Infrastructure in the first place we realise that we are adopting this approach to improve the performance of our business.  The question is how do we measure whether the adoption of the Digital Infrastructure itself is a success?

We can measure business performance by comparing production, sales, revenue, share price, dividends or customer satisfaction with our goals. We can measure IT performance by comparing server, application and network uptime; or by service resolution time, budget performance or project completion with our goals. We already use some or all these measures to rate our performance compared with that of our competitors or the expectations of our customers, partners or shareholders.

The problem we now have is that we need to measure the effect of IT performance on the performance of the business which is a complicated process and one that is further complicated by the fact that we need to apply this measurement beyond our IT function into the eco-system of partners and providers to allow us to assess the effectiveness of each of them as well.

Despite the complexity of our eco-systems we need to present a unified approach to the business and so we need to assume the role of Service Provider rather than IT Department.  Our services are constructed within our Digital Infrastructure and delivered from multiple locations by multiple vendors but we need to aggregate this within our Service Provider function and show this back to the business as a simple Service Catalogue.

Too many IT Departments complicate this by showing the business all of the components and exposing the business to the various different partners and vendors.  The end users become frustrated because they don’t know (and usually don’t care) which system has broken.  They just know they can’t do the one thing they need to.  The Digital Infrastructure needs to be displayed to the business as a unified single entity and be capable of being managed as such.

What we have to change is how we governance, monitor and manage the Digital Infrastructure and that is what differentiates the Data Centre in a Multi Cloud world.  These are not small changes we need to make to existing operational procedures.  This is a sea change we are facing if we are to get this right.  The organisations that have got it right are those that tore up the IT operations playbook and re-imagined how they work.

Now is the time to step back, take a breath and look long and hard at IT Operations.  Effective Data Centre Design in a Multi Cloud World depends on us realising that we need a new approach and not an improved approach.


Article by channel:

Read more articles tagged: Featured, Frameworks