Skip to main content

Top 5 Service Performance Challenges in the Cloud

What do Amazon EC2, Microsoft Azure, and Google Apps have in common? They’re all cloud computing services, of course. But they share something else in common — each of these clouds has experienced periods of outages and slowdowns, impacting businesses worldwide that increasingly rely on the cloud for critical operations. And while there’s a great deal of publicity when these prominent public clouds suffer outages, it’s no less damaging to the business when an IT department’s private cloud goes off-line, even if it doesn’t make the news. It’s no wonder that according to analyst firm IDC, two of the top three concerns that CIO’s have about cloud computing are performance and availability.

Moving services to the cloud promises to deliver increased agility at a lower cost − but there are many risks along the way and greater complexity to manage when you get there. The following are five critical hurdles that you may face when implementing and operating a private cloud or hybrid cloud and how you can overcome them.

1. Will it work? How can you tell which applications are suitable for cloud and plan a successful migration?

Not every application is suitable for the cloud. And sometimes one part of an application is cloud-ready while other components are not. You need to identify the most suitable applications and components for migration, identify potential problems such as chattiness and latency that are amplified in the cloud, and create a performance baseline that you can test against after migration. With a clear picture of service dependencies and infrastructure usage, you can create a checklist that will ensure a complete and successful migration.

2. Performance – If you don’t know which physical servers your application is running on, how do you find server-related root causes when performance issues arise?

In fully-dedicated environments, we sometimes use infrastructure metrics and events to diagnose performance issues. But inferring application performance from tier-based statistics becomes challenging – if not impossible – when applications share dynamically allocated physical resources. To manage application performance in the cloud, you need a real-time topological map of service delivery across all tiers. Since the landscape is always changing, it’s essential that the dependency map is dynamically generated and automatically updated for every single transaction and service instance.

3. Chargeback – How do you know how much CPU your application is consuming in order to choose an appropriate chargeback model or verify your bills?

IT needs a new paradigm for assessing resource consumption in order to transition from a resource-focused cost-center to a business-service-focused profit-center. But traditional chargeback and APM tools do not collect resource utilization per transaction to enable business-aligned costing and chargeback paradigms. For the cloud, you need a solution that monitors consumption for every service across multiple applications and tiers, so you can accurately cost services, decide on appropriate chargeback schemes, and tune applications and infrastructure for better resource utilization and lower cost.

4. Not aligned with the business – How do you ensure that services are allocated according to business priority?

Clouds offer us new levels of dynamic resource allocation. However, to ensure that SLAs in the cloud are met, you must be able to prioritize the allocation of resources based on measurements of real end-user performance and an accurate view of where additional resources can truly alleviate SLA risks. To make that possible, you need a clear picture of resource consumption at the transaction level and business intelligence about the impact of each infrastructure tier on performance. Provisioning based on business priorities becomes even more critical as cloud architectures transition to a dynamic auto-provisioning model.

5. Over-provisioning – How can you right-size capacity and prevent over-provisioning that undercuts ROI?

Sharing IT infrastructure can be more efficient and cost-effective – assuming you have an accurate picture of resource usage for each service, an understanding of how that allocation affects SLA compliance, and the ability to prioritize resource allocation. In the cloud, a complete history of all transaction instances, including precise resource utilization metrics and SLAs, is essential for making intelligent decisions about provisioning. And with an accurate picture of resource consumption for each business transaction, cloud owners can plan future capacity requirements accurately.

Russell Rothstein is Founder and CEO, IT Central Station.

Hot Topics

The Latest

E-commerce is set to skyrocket with a 9% rise over the next few years ... To thrive in this competitive environment, retailers must identify digital resilience as their top priority. In a world where savvy shoppers expect 24/7 access to online deals and experiences, any unexpected downtime to digital services can lead to significant financial losses, damage to brand reputation, abandoned carts with designer shoes, and additional issues ...

Efficiency is a highly-desirable objective in business ... We're seeing this scenario play out in enterprises around the world as they continue to struggle with infrastructures and remote work models with an eye toward operational efficiencies. In contrast to that goal, a recent Broadcom survey of global IT and network professionals found widespread adoption of these strategies is making the network more complex and hampering observability, leading to uptime, performance and security issues. Let's look more closely at these challenges ...

Image
Broadcom

The 2025 Catchpoint SRE Report dives into the forces transforming the SRE landscape, exploring both the challenges and opportunities ahead. Let's break down the key findings and what they mean for SRE professionals and the businesses relying on them ...

Image
Catchpoint

The pressure on IT teams has never been greater. As data environments grow increasingly complex, resource shortages are emerging as a major obstacle for IT leaders striving to meet the demands of modern infrastructure management ... According to DataStrike's newly released 2025 Data Infrastructure Survey Report, more than half (54%) of IT leaders cite resource limitations as a top challenge, highlighting a growing trend toward outsourcing as a solution ...

Image
Datastrike

Gartner revealed its top strategic predictions for 2025 and beyond. Gartner's top predictions explore how generative AI (GenAI) is affecting areas where most would assume only humans can have lasting impact ...

The adoption of artificial intelligence (AI) is accelerating across the telecoms industry, with 88% of fixed broadband service providers now investigating or trialing AI automation to enhance their fixed broadband services, according to new research from Incognito Software Systems and Omdia ...

 

AWS is a cloud-based computing platform known for its reliability, scalability, and flexibility. However, as helpful as its comprehensive infrastructure is, disparate elements and numerous siloed components make it difficult for admins to visualize the cloud performance in detail. It requires meticulous monitoring techniques and deep visibility to understand cloud performance and analyze operational efficiency in detail to ensure seamless cloud operations ...

Imagine a future where software, once a complex obstacle, becomes a natural extension of daily workflow — an intuitive, seamless experience that maximizes productivity and efficiency. This future is no longer a distant vision but a reality being crafted by the transformative power of Artificial Intelligence ...

Enterprise data sprawl already challenges companies' ability to protect and back up their data. Much of this information is never fully secured, leaving organizations vulnerable. Now, as GenAI platforms emerge as yet another environment where enterprise data is consumed, transformed, and created, this fragmentation is set to intensify ...

Image
Crashplan

OpenTelemetry (OTel) has revolutionized the way we approach observability by standardizing the collection of telemetry data ... Here are five myths — and truths — to help elevate your OTel integration by harnessing the untapped power of logs ...

Top 5 Service Performance Challenges in the Cloud

What do Amazon EC2, Microsoft Azure, and Google Apps have in common? They’re all cloud computing services, of course. But they share something else in common — each of these clouds has experienced periods of outages and slowdowns, impacting businesses worldwide that increasingly rely on the cloud for critical operations. And while there’s a great deal of publicity when these prominent public clouds suffer outages, it’s no less damaging to the business when an IT department’s private cloud goes off-line, even if it doesn’t make the news. It’s no wonder that according to analyst firm IDC, two of the top three concerns that CIO’s have about cloud computing are performance and availability.

Moving services to the cloud promises to deliver increased agility at a lower cost − but there are many risks along the way and greater complexity to manage when you get there. The following are five critical hurdles that you may face when implementing and operating a private cloud or hybrid cloud and how you can overcome them.

1. Will it work? How can you tell which applications are suitable for cloud and plan a successful migration?

Not every application is suitable for the cloud. And sometimes one part of an application is cloud-ready while other components are not. You need to identify the most suitable applications and components for migration, identify potential problems such as chattiness and latency that are amplified in the cloud, and create a performance baseline that you can test against after migration. With a clear picture of service dependencies and infrastructure usage, you can create a checklist that will ensure a complete and successful migration.

2. Performance – If you don’t know which physical servers your application is running on, how do you find server-related root causes when performance issues arise?

In fully-dedicated environments, we sometimes use infrastructure metrics and events to diagnose performance issues. But inferring application performance from tier-based statistics becomes challenging – if not impossible – when applications share dynamically allocated physical resources. To manage application performance in the cloud, you need a real-time topological map of service delivery across all tiers. Since the landscape is always changing, it’s essential that the dependency map is dynamically generated and automatically updated for every single transaction and service instance.

3. Chargeback – How do you know how much CPU your application is consuming in order to choose an appropriate chargeback model or verify your bills?

IT needs a new paradigm for assessing resource consumption in order to transition from a resource-focused cost-center to a business-service-focused profit-center. But traditional chargeback and APM tools do not collect resource utilization per transaction to enable business-aligned costing and chargeback paradigms. For the cloud, you need a solution that monitors consumption for every service across multiple applications and tiers, so you can accurately cost services, decide on appropriate chargeback schemes, and tune applications and infrastructure for better resource utilization and lower cost.

4. Not aligned with the business – How do you ensure that services are allocated according to business priority?

Clouds offer us new levels of dynamic resource allocation. However, to ensure that SLAs in the cloud are met, you must be able to prioritize the allocation of resources based on measurements of real end-user performance and an accurate view of where additional resources can truly alleviate SLA risks. To make that possible, you need a clear picture of resource consumption at the transaction level and business intelligence about the impact of each infrastructure tier on performance. Provisioning based on business priorities becomes even more critical as cloud architectures transition to a dynamic auto-provisioning model.

5. Over-provisioning – How can you right-size capacity and prevent over-provisioning that undercuts ROI?

Sharing IT infrastructure can be more efficient and cost-effective – assuming you have an accurate picture of resource usage for each service, an understanding of how that allocation affects SLA compliance, and the ability to prioritize resource allocation. In the cloud, a complete history of all transaction instances, including precise resource utilization metrics and SLAs, is essential for making intelligent decisions about provisioning. And with an accurate picture of resource consumption for each business transaction, cloud owners can plan future capacity requirements accurately.

Russell Rothstein is Founder and CEO, IT Central Station.

Hot Topics

The Latest

E-commerce is set to skyrocket with a 9% rise over the next few years ... To thrive in this competitive environment, retailers must identify digital resilience as their top priority. In a world where savvy shoppers expect 24/7 access to online deals and experiences, any unexpected downtime to digital services can lead to significant financial losses, damage to brand reputation, abandoned carts with designer shoes, and additional issues ...

Efficiency is a highly-desirable objective in business ... We're seeing this scenario play out in enterprises around the world as they continue to struggle with infrastructures and remote work models with an eye toward operational efficiencies. In contrast to that goal, a recent Broadcom survey of global IT and network professionals found widespread adoption of these strategies is making the network more complex and hampering observability, leading to uptime, performance and security issues. Let's look more closely at these challenges ...

Image
Broadcom

The 2025 Catchpoint SRE Report dives into the forces transforming the SRE landscape, exploring both the challenges and opportunities ahead. Let's break down the key findings and what they mean for SRE professionals and the businesses relying on them ...

Image
Catchpoint

The pressure on IT teams has never been greater. As data environments grow increasingly complex, resource shortages are emerging as a major obstacle for IT leaders striving to meet the demands of modern infrastructure management ... According to DataStrike's newly released 2025 Data Infrastructure Survey Report, more than half (54%) of IT leaders cite resource limitations as a top challenge, highlighting a growing trend toward outsourcing as a solution ...

Image
Datastrike

Gartner revealed its top strategic predictions for 2025 and beyond. Gartner's top predictions explore how generative AI (GenAI) is affecting areas where most would assume only humans can have lasting impact ...

The adoption of artificial intelligence (AI) is accelerating across the telecoms industry, with 88% of fixed broadband service providers now investigating or trialing AI automation to enhance their fixed broadband services, according to new research from Incognito Software Systems and Omdia ...

 

AWS is a cloud-based computing platform known for its reliability, scalability, and flexibility. However, as helpful as its comprehensive infrastructure is, disparate elements and numerous siloed components make it difficult for admins to visualize the cloud performance in detail. It requires meticulous monitoring techniques and deep visibility to understand cloud performance and analyze operational efficiency in detail to ensure seamless cloud operations ...

Imagine a future where software, once a complex obstacle, becomes a natural extension of daily workflow — an intuitive, seamless experience that maximizes productivity and efficiency. This future is no longer a distant vision but a reality being crafted by the transformative power of Artificial Intelligence ...

Enterprise data sprawl already challenges companies' ability to protect and back up their data. Much of this information is never fully secured, leaving organizations vulnerable. Now, as GenAI platforms emerge as yet another environment where enterprise data is consumed, transformed, and created, this fragmentation is set to intensify ...

Image
Crashplan

OpenTelemetry (OTel) has revolutionized the way we approach observability by standardizing the collection of telemetry data ... Here are five myths — and truths — to help elevate your OTel integration by harnessing the untapped power of logs ...