Many things can cause data center failure – some are common causes that almost everyone faces (such as human error) while other causes are very rare. But no matter what causes it, the outcome is the same: lost revenue, lost productivity and lost customers. Last year, the average cost of unplanned data center failure was $8,850 per minute.

We’ve compiled a short list of some of the most common causes of data center failure. Let us know in the comments below what you would add!

Human Error

Whether we’re talking about mistakes made during design, installation or maintenance, people are often to blame for data center failure. The Uptime Institute cites that nearly 70% of data center outages can be attributed to human error.

Many aspects of a data center invite the potential for mistakes, whether due to illogical layout, poor (or no) labeling, lack of maintenance or inadequate training. Even the simplest oversight can result in a serious downtime event that may be difficult and costly to recover from. Some of the common mistakes that result in data center failure include:

  • Activation of the emergency power-off (EPO) switch
  • Adjusting the temperature from Fahrenheit to Celsius
  • Pulling power cords out of equipment
  • Overloading a circuit
  • Not following standard policies or procedures

To minimize the opportunity for human error, there are several things you can do:

  • Make time for adequate training, engagement and documentation
  • Define ownership and specific tasks
  • Practice for downtime
  • Standardize solutions when possible
  • Keep areas neat, clean and labeled

Learn more about how to implement these practices by downloading our e-book: 9 Tips to Improve System Uptime.

Cooling Failure

Overheating can bring down a data center. When equipment gets too hot, it shuts down to protect itself, causing data center failure. Overheating can occur when:

  • Not enough cold air is being sent to the cold aisle in a cold-aisle containment system
  • There is a lack of airflow – or uneven airflow – through cabinets
  • Cooling system redundancy is lost

To make sure you don’t suffer from a cooling failure, periodically check cooling equipment and make sure that everything is operating as intended. You can also use computational fluid dynamics (CFD) modeling to test your cooling system under different failure scenarios to see what would happen – and how quickly.

If you haven’t already, consider investing in an environmental monitoring system that will alert you as soon as temperatures begin to shift in an unsafe direction. Also make sure that your cabinets support adequate and even airflow.

Cabling Problems

Cabling is the groundwork for a high-performance high-functioning data center. If the cabling system experiences a failure, the data center may be at risk as well. Potential cabling problems could include:

  • High-density, tightly packed cable bundles
  • Kinks or bends in the cable
  • Poorly constructed cable with poor return loss or near-end crosstalk
  • Using the wrong cable for the application

Make sure you’re following cable management best practices to avoid potential damage to cables. And when it’s time to select new data center cabling, to weigh your options and make sure you’re investing in a high-performance cabling solution that will give you what you need now, and support the technology you’ll need in the future.

Security Issues

Cybersecurity issues are a growing cause of data center outages. These issues could be internal (an employee accidentally falling prey to a phishing scheme) or an external party trying to hack into your network.

Make sure your IT assets are protected at all levels, from the perimeter to the port. Intelligent monitoring and reporting can also help you track flow and movement inside the data center so you know who accesses what – and when they accessed it.

Belden can help you design a data center that resists failure, from reducing the potential for human error to making sure systems are running like they should be. Learn more about how your data center can be designed to maximize uptime.