The Value of a Data Center Risk Assessment for Your Next Project
For data centers, risk doesn’t just mean unplanned downtime—risk can equal lost revenue, damaged equipment, upset customers and even lives in danger.
Before the start of a project, a data center risk assessment and analysis can uncover the answer to that question by identifying potential data center risks, how “risky” they truly are (how much the risks will impact business performance if they play out) and ways to mitigate potential for these risks to occur.
Although eliminating every potential cause of downtime is impossible—human error, cybersecurity threats, natural disasters, server failure and the list goes on—it’s much more practical to take steps to conduct a data center risk assessment to decrease prevalent risks that could lead to downtime.
In new data center projects, these threats to uptime can be reduced through choices made during the data center design and planning process. It starts with a determination of the data center’s correct availability class. Understanding that will guide the minimum performance and availability requirements for your project.
A data center’s “availability” refers to its ability to perform its intended function. This mathematical expression represents system availability.
Uptime / (uptime + scheduled downtime + unscheduled downtime) = availability
In this equation:
- Uptime and downtime are measured in units of time within a specific period
- Scheduled downtime includes things like preventive maintenance, equipment setup, upgrades, testing and optimization
- Unscheduled downtime includes things like repair due to failure, maintenance delays and facility-related failures or outages
ANSI/TIA-942 classifies data centers into four levels using telecommunications, electrical, architectural and mechanical ratings based on how the data center should be designed and constructed:
- Rated 1: Single-capacity components and a single, non-redundant distribution path
- Rated 2: Redundant capacity components and a single, non-redundant distribution path
- Rated 3: Redundant capacity components and independent distribution paths
- Rated 4: Redundant capacity components and independent distribution path
The Uptime Institute classifies data centers using a four-tiered approach that indicates the level of necessary resiliency:
- Tier 1: Basic capacity level requiring only an uninterruptible power supply (UPS) for outages, an area for IT systems, dedicated cooling and an engine generator
- Tier 2: Redundant capacity components for power and cooling
- Tier 3: Concurrently maintainable with redundant components
- Tier 4: Independent and physically isolated systems that act as redundant capacity components and distribution paths
To determine the proper availability class for your data center project, there are three questions to answer.
1. What are the data center’s operational requirements?
Consider the time available to conduct planned maintenance shutdowns. This includes time available to shut systems down, turn systems off and work through maintenance issues and concerns. If the data center must function 24/7 and can’t withstand any hours of planned maintenance shutdown, then it’s likely a Tier 4 data center that needs built-in redundancy for every component.
2. What is the data center’s operational availability?
Determine the data center’s operational availability requirements (the total amount of time the data center must be capable of offering support without disruption).
This is where the phrase “nines of availability” comes into play. Availability is normally expressed in 9s. For example, “five-nines uptime” translates to 99.999% uptime—or an average of less than six minutes of downtime per year.
3. What is the data center’s impact of downtime?
The third and final step is to identify the impact of data center downtime on the organization. How much will it impact business? What are the consequences?
Not all downtime is equal. For example, 15 minutes of downtime for an insurance or media company may not be nearly as impactful as 15 minutes of downtime for a hospital or manufacturing plant.
Don’t Forget: Data Center Types & the Cloud
Once these questions are answered, then it’s time to consider two more factors for your data center risk assessment: the type of data center and the presence of a cloud environment. Both considerations play a role in determining risk level or tolerance.
For example, availability in a multi-tenant or colocation data center that provides services to financial institutions, healthcare, IT, manufacturing, government and retail is crucial. Customers pay for a certain level of redundancy and reassurance that systems will always be available. Unplanned downtime could take all customers’ businesses down.
In addition to the type of data center you’re working with, there’s also the cloud to consider when conducting a data center risk assessment.
The public cloud offers computing services through a third-party provider over the internet, and a third-party provider is responsible for managing and maintaining it. A private cloud is an on-premises data center infrastructure that contains server, storage, memory and networking capacity. The hybrid cloud combines private and public cloud. Each has its own requirements and tolerances for downtime.
Planning for Potential Incidents
The final step in a new data center risk assessment is establishing a plan for business continuity or disaster recovery to make sure information can be recovered quickly in the event of a disaster.
Because each IT environment is unique, there isn’t a one-size-fits-all process that works for all data centers, but there are three things each plan should include:
- Preventive measures that attempt to avoid a disaster by recognizing and reducing risk. These measures can include software backup plans, uninterruptible power supplies, generators and routine maintenance and inspection.
- Detective measures, which require actions to eliminate unwanted events. These can include fire and security systems, antivirus software, backup software and employee training.
- Corrective or reactive measures that minimize the amount of downtime or loss.
- Disaster recovery as a service (DRaaS), which is a geographically disperse mirroring solution that allows for the recovery of data and the ability to recover if the main data center fails or goes down. It also allows you to recover data up to a certain point in time.
- Build a second data center and run it passively until needed—or run it actively along with the primary data center to pick up processing and storage in the case of a disaster or catastrophe.
If you need help completing a data center risk assessment for your next project, our in-house experts are here to help.
To learn more about data center risk assessment and many other data center topics, explore our new Introducing Data Center Essentials Level 1 Training Course, available on demand as part of the Belden Academy. Earn three CECs, explore the characteristics that make data center projects unique and uncover tips to handling data center projects correctly from start to finish.
Begin here to learn from our data center experts and earn your certification.