Solutions for Minimizing Server Downtime
April 11, 2018

Chris Adams
Park Place Technologies

Share this

As we've seen, hardware is at the root of a large proportion of data center outages, and the costs and consequences are often exacerbated when VMs are affected. The best answer, therefore, is for IT pros to get back to basics.

Start with Part 1: Complacency Kills Uptime in Virtualized Environments

Just as drivers wearing seatbelts should still use turn signals (even though many don't), data center managers should continue to take the usual precautions to protect against equipment-related outages. Put simply:

Attend to the hardware

In the rush to implement the latest technologies, don't overlook the fundamentals, such as routine server maintenance, UPS tests and upgrades, and facility checks for hotspots, air flow problems, and other issues.

Integrate monitoring and response

Only about half of IT organizations rely on their monitoring tool or ticketing system to activate a response team. This is a lost opportunity for accelerating break/fix. So is the failure to utilize newer AI-driven hardware monitoring technologies which are becoming highly accessible.

Have parts on standby

It's no good to go searching for spares after a hardware failure occurs. Spare parts should be on site for mission critical systems or available for quick delivery in other cases.

Invest in expertise

Having the right people with the right skills is essential. Unfortunately, today's tight IT labor market is making it difficult to find and afford talent. Data center managers should consider whether they have the budget to build comprehensive engineering capabilities or if they are better off sourcing it from a partner.

It can be hard to manage these tasks in addition to the many responsibilities that have been piled on data center personnel over the past decade. In many cases, the easiest and most affordable option is to hand off the bulk of the hardware "to do" list to a third-party provider specializing in IT support. That way someone else can effectively address the risk associated with hardware through 24/7 monitoring, spares management, and immediate Level 3 support while the business gets back to business.

Chris Adams is President and COO of Park Place Technologies
Share this

The Latest

October 23, 2018

For anyone that's been in a war room, there's no denying that it can be an intense place. Teams go to the war room to win. But, the ideal outcome is a solid plan or solution designed to deliver the best outcome while utilizing the least resources. What are some of the key triggers that drive IT teams into the war room and how can you prepare yourself to contribute in a positive way? ...

October 22, 2018

With Black Friday and Cyber Monday just weeks away, Catchpoint has identified the top five technical items most likely to cause web or mobile shopping sites to perform poorly ...

October 19, 2018

APM is becoming more complex as the days go by. Server virtualization and cloud-based systems with containers and orchestration layers are part of this growing complexity, especially as the number of data sources increases and continues to change dynamically. To keep up with this changing environment, you will need to automate as many of your systems as possible. Open APIs can be an effective way to combat this scenario ...

October 18, 2018

Two years ago, Amazon, Comcast, Twitter and Netflix were effectively taken off the Internet for multiple hours by a DDoS attack because they all relied on a single DNS provider. Can it happen again? ...

October 17, 2018

We're seeing artificial intelligence for IT operations or "AIOps" take center stage in the IT industry. If AIOps hasn't been on your horizon yet, look closely and expect it soon. So what can we expect from automation and AIOps as it becomes more commonplace? ...

October 15, 2018

Use of artificial intelligence (AI) in digital commerce is generally considered a success, according to a survey by Gartner, Inc. About 70 percent of digital commerce organizations surveyed report that their AI projects are very or extremely successful ...

October 12, 2018

Most organizations are adopting or considering adopting machine learning due to its benefits, rather than with the intention to cut people’s jobs, according to the Voice of the Enterprise (VoTE): AI & Machine Learning – Adoption, Drivers and Stakeholders 2018 survey conducted by 451 Research ...

October 11, 2018

AI (Artificial Intelligence) and ML (Machine Learning) are the number one strategic enterprise IT investment priority in 2018 (named by 33% of enterprises), taking the top spot from container management (28%), and clearly leaving behind DevOps pipeline automation (13%), according to new EMA research ...

October 09, 2018

Although Windows and Linux were historically viewed as competitors, modern IT advancements have ensured much needed network availability between these ecosystems for redundancy, fault tolerance, and competitive advantage. Software that offers intelligent availability enables the dynamic transfer of data and its processing to the best execution environment for any given purpose. That may be on-premises, in the cloud, in containers, in Windows, or in Linux ...

October 04, 2018

TEKsystems released the results of its 2018 Forecast Reality Check, measuring the current impact of market conditions on IT initiatives, hiring, salaries and skill needs. Here are some key results ...