Skip to main content

5 IT Operations Challenges – and 1 Main Cause

A recent survey of IT Operations executives found the following to be the most impactful challenges facing their teams:

1. Too much time spent resolving business-impacting application outages or slowdowns

IT Operations teams spend too many work hours resolving application performance problems. The biggest single opportunity to reduce this resource drain is to streamline the process of localizing the problem – in short, find the actual problem faster.

The Mean Time to Resolve issues (MTTR) is the primary measure that Operations teams use to determine their effectiveness in dealing with problems. Whenever Operations teams speak about management projects, the ultimate goal is to reduce MTTR.

Forrester Research’s Evelyn Oehrlich breaks down MTTR as the sum of four sub-components:

- The time it takes to detect a problem (measured by the MTTI or “Mean Time to Identify”)

- The time it takes to isolate the problem (measured by the MTTK or “Mean Time to Know”)

- The time it takes to implement the fix (measured by MTTF or “Mean Time to Fix”)

- The time it takes to verify that the fix is working (measured by MTTV or “Mean Time to Verify”)


Image removed.

As shown in the above graphic, the best opportunity to reduce the overall time to resolve issues is to cut the time spent isolating the root cause of the problem. Understanding that quickly finding where a problem occurred has the most potential for improvement, the focus should be on tools that isolate issues across the infrastructure, as opposed to solving more issues on specific platforms.

2. Too many Experts are required to help with application incidents

Having too many people spending time on bridge calls is a well-documented complaint. But really it is a symptom of a process problem. Bridge calls are attended by specialists – experts from the web tier, the database, the application server, etc. They are all on the same call to facilitate a broken process – that of using people to isolate the source of an application problem.

Management tools that provide a holistic view of application performance can eliminate the need to assemble the large team, and instead allow a single individual to isolate the problem component. With the source of the problem isolated, that individual can then engage only with the appropriate expert.

3. Problems are only discovered when users complain about them

As an Operations executive, I have a rule for my team that I never be surprised. I’m also realistic enough to know that sometimes things will go wrong. In an IT environment, outages will happen. The key is to not be surprised by the outages. This means proactive monitoring of applications – not just at the individual component level, but across the entire application infrastructure.

In today’s complex environments, relying on resource monitoring of individual servers is a sure-fire path to unpleasant surprises.

4. Management Tools can’t support the Mix of Technologies that Make up Apps

Management tools are built to address specific problems. Many of the APM tools in the field today were built to handle Java and .Net application code manipulation. These capabilities are important, but they don’t address problems that originate outside the application code. Enterprise applications of today are complicated animals – often consisting of several different discrete component types. IT Operations teams need to ensure that their management tools can identify and address the most common sources of application failures.

5. Implementation of mass virtualization, Private, and Hybrid Cloud

Virtualization and Private Cloud have become mainstream components of today’s application environments, and Hybrid Cloud is expected to be very big in 2013. Disconnecting the applications from the infrastructure specification creates a management visibility gap as to how systems are working together to deliver the business services. IT Operations teams need tools that can provide a complete application view across all these environments to be able to avoid the management blind spots.

One Major Cause: Application Complexity

The key difference in today’s applications from those running even 3 years ago is growing complexity. New technologies allow for more sophisticated enterprise applications - the days of applications being made up of a web server and a database are gone. Today’s applications feature a broad mixture of technologies and platforms, all with specialized functions and specialized management tools.

It’s this massive complexity that creates the other challenges:

- Today’s applications employ infrastructure components that alter the transaction paths and topology layout of applications on the fly, making it difficult or impossible to understand how transactions, applications and infrastructure work together.

- The resulting visibility gaps require experts for each platform to be available, just to TRY to understand (as a team) where applications and transactions go.

- Rapid change means that management tools struggle to keep up with the pace of change in technologies. In some cases, the platforms are so new that traditional management tools have no effective way of seeing even basic information.

- Finally, it is this dynamic complexity that makes it difficult for either IT Operations team members or full SWAT teams to solve problems when they occur. There are simply too many moving parts, too many new technologies, all put together in an unknown way (to the Operations Team) to deliver the desired business service.

The best way to deal with these challenges is to take a service-oriented approach to application service delivery. IT Operations teams that focus on how infrastructure performance impacts end-user service levels and use tools that manage transactions, applications, and infrastructure together will be able to overcome the 5 major challenges – cutting through those management gaps to provide true service management.

ABOUT Vic Nyman

Vic Nyman is the co-founder and COO of BlueStripe Software. Nyman has over 20 years of experience in systems management and APM and has held leadership positions at Wily Technology, IBM Tivoli, and Relicore/Symantec.

Hot Topics

The Latest

AI is the catalyst for significant investment in data teams as enterprises require higher-quality data to power their AI applications, according to the State of Analytics Engineering Report from dbt Labs ...

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

A Gartner analyst recently suggested that GenAI tools could create 25% time savings for network operational teams. Where might these time savings come from? How are GenAI tools helping NetOps teams today, and what other tasks might they take on in the future as models continue improving? In general, these savings come from automating or streamlining manual NetOps tasks ...

IT and line-of-business teams are increasingly aligned in their efforts to close the data gap and drive greater collaboration to alleviate IT bottlenecks and offload growing demands on IT teams, according to The 2025 Automation Benchmark Report: Insights from IT Leaders on Enterprise Automation & the Future of AI-Driven Businesses from Jitterbit ...

A large majority (86%) of data management and AI decision makers cite protecting data privacy as a top concern, with 76% of respondents citing ROI on data privacy and AI initiatives across their organization, according to a new Harris Poll from Collibra ...

According to Gartner, Inc. the following six trends will shape the future of cloud over the next four years, ultimately resulting in new ways of working that are digital in nature and transformative in impact ...

2020 was the equivalent of a wedding with a top-shelf open bar. As businesses scrambled to adjust to remote work, digital transformation accelerated at breakneck speed. New software categories emerged overnight. Tech stacks ballooned with all sorts of SaaS apps solving ALL the problems — often with little oversight or long-term integration planning, and yes frequently a lot of duplicated functionality ... But now the music's faded. The lights are on. Everyone from the CIO to the CFO is checking the bill. Welcome to the Great SaaS Hangover ...

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

5 IT Operations Challenges – and 1 Main Cause

A recent survey of IT Operations executives found the following to be the most impactful challenges facing their teams:

1. Too much time spent resolving business-impacting application outages or slowdowns

IT Operations teams spend too many work hours resolving application performance problems. The biggest single opportunity to reduce this resource drain is to streamline the process of localizing the problem – in short, find the actual problem faster.

The Mean Time to Resolve issues (MTTR) is the primary measure that Operations teams use to determine their effectiveness in dealing with problems. Whenever Operations teams speak about management projects, the ultimate goal is to reduce MTTR.

Forrester Research’s Evelyn Oehrlich breaks down MTTR as the sum of four sub-components:

- The time it takes to detect a problem (measured by the MTTI or “Mean Time to Identify”)

- The time it takes to isolate the problem (measured by the MTTK or “Mean Time to Know”)

- The time it takes to implement the fix (measured by MTTF or “Mean Time to Fix”)

- The time it takes to verify that the fix is working (measured by MTTV or “Mean Time to Verify”)


Image removed.

As shown in the above graphic, the best opportunity to reduce the overall time to resolve issues is to cut the time spent isolating the root cause of the problem. Understanding that quickly finding where a problem occurred has the most potential for improvement, the focus should be on tools that isolate issues across the infrastructure, as opposed to solving more issues on specific platforms.

2. Too many Experts are required to help with application incidents

Having too many people spending time on bridge calls is a well-documented complaint. But really it is a symptom of a process problem. Bridge calls are attended by specialists – experts from the web tier, the database, the application server, etc. They are all on the same call to facilitate a broken process – that of using people to isolate the source of an application problem.

Management tools that provide a holistic view of application performance can eliminate the need to assemble the large team, and instead allow a single individual to isolate the problem component. With the source of the problem isolated, that individual can then engage only with the appropriate expert.

3. Problems are only discovered when users complain about them

As an Operations executive, I have a rule for my team that I never be surprised. I’m also realistic enough to know that sometimes things will go wrong. In an IT environment, outages will happen. The key is to not be surprised by the outages. This means proactive monitoring of applications – not just at the individual component level, but across the entire application infrastructure.

In today’s complex environments, relying on resource monitoring of individual servers is a sure-fire path to unpleasant surprises.

4. Management Tools can’t support the Mix of Technologies that Make up Apps

Management tools are built to address specific problems. Many of the APM tools in the field today were built to handle Java and .Net application code manipulation. These capabilities are important, but they don’t address problems that originate outside the application code. Enterprise applications of today are complicated animals – often consisting of several different discrete component types. IT Operations teams need to ensure that their management tools can identify and address the most common sources of application failures.

5. Implementation of mass virtualization, Private, and Hybrid Cloud

Virtualization and Private Cloud have become mainstream components of today’s application environments, and Hybrid Cloud is expected to be very big in 2013. Disconnecting the applications from the infrastructure specification creates a management visibility gap as to how systems are working together to deliver the business services. IT Operations teams need tools that can provide a complete application view across all these environments to be able to avoid the management blind spots.

One Major Cause: Application Complexity

The key difference in today’s applications from those running even 3 years ago is growing complexity. New technologies allow for more sophisticated enterprise applications - the days of applications being made up of a web server and a database are gone. Today’s applications feature a broad mixture of technologies and platforms, all with specialized functions and specialized management tools.

It’s this massive complexity that creates the other challenges:

- Today’s applications employ infrastructure components that alter the transaction paths and topology layout of applications on the fly, making it difficult or impossible to understand how transactions, applications and infrastructure work together.

- The resulting visibility gaps require experts for each platform to be available, just to TRY to understand (as a team) where applications and transactions go.

- Rapid change means that management tools struggle to keep up with the pace of change in technologies. In some cases, the platforms are so new that traditional management tools have no effective way of seeing even basic information.

- Finally, it is this dynamic complexity that makes it difficult for either IT Operations team members or full SWAT teams to solve problems when they occur. There are simply too many moving parts, too many new technologies, all put together in an unknown way (to the Operations Team) to deliver the desired business service.

The best way to deal with these challenges is to take a service-oriented approach to application service delivery. IT Operations teams that focus on how infrastructure performance impacts end-user service levels and use tools that manage transactions, applications, and infrastructure together will be able to overcome the 5 major challenges – cutting through those management gaps to provide true service management.

ABOUT Vic Nyman

Vic Nyman is the co-founder and COO of BlueStripe Software. Nyman has over 20 years of experience in systems management and APM and has held leadership positions at Wily Technology, IBM Tivoli, and Relicore/Symantec.

Hot Topics

The Latest

AI is the catalyst for significant investment in data teams as enterprises require higher-quality data to power their AI applications, according to the State of Analytics Engineering Report from dbt Labs ...

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

A Gartner analyst recently suggested that GenAI tools could create 25% time savings for network operational teams. Where might these time savings come from? How are GenAI tools helping NetOps teams today, and what other tasks might they take on in the future as models continue improving? In general, these savings come from automating or streamlining manual NetOps tasks ...

IT and line-of-business teams are increasingly aligned in their efforts to close the data gap and drive greater collaboration to alleviate IT bottlenecks and offload growing demands on IT teams, according to The 2025 Automation Benchmark Report: Insights from IT Leaders on Enterprise Automation & the Future of AI-Driven Businesses from Jitterbit ...

A large majority (86%) of data management and AI decision makers cite protecting data privacy as a top concern, with 76% of respondents citing ROI on data privacy and AI initiatives across their organization, according to a new Harris Poll from Collibra ...

According to Gartner, Inc. the following six trends will shape the future of cloud over the next four years, ultimately resulting in new ways of working that are digital in nature and transformative in impact ...

2020 was the equivalent of a wedding with a top-shelf open bar. As businesses scrambled to adjust to remote work, digital transformation accelerated at breakneck speed. New software categories emerged overnight. Tech stacks ballooned with all sorts of SaaS apps solving ALL the problems — often with little oversight or long-term integration planning, and yes frequently a lot of duplicated functionality ... But now the music's faded. The lights are on. Everyone from the CIO to the CFO is checking the bill. Welcome to the Great SaaS Hangover ...

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...