In Part One of BSMdigest’s exclusive interview, Kalyan Ramanathan, Director of BSM for HP, talks about BSM and performance management in the cloud.
BSM: What is your view of the importance of BSM in the cloud?
KR: Business Services Management (BSM) solutions typically provide IT with the capabilities to manage the performance as well as availability of IT applications and resources.
Today, enterprises are adopting cloud delivery models (as either providers or consumers) to provide elastic and agile services to their business counterparts. While the cloud represents a new delivery model, the expectations of IT have not changed dramatically. As a producer of IT services, business customers expect IT to manage these mission-critical cloud services with high performance and availability. Equally important, as IT consumes services from public/third party/outsourced vendors, IT needs to manage and measure the service level agreements (SLAs) promised by the vendor.
BSM provides the ability to manage all cloud resources – infrastructure, application and Software as a Service (SaaS). More importantly BSM enables enterprises to focus on the SLA of the committed resources, thus enabling IT delivery on the promise of the cloud delivery model.
BSM: From a BSM perspective, which applications and/or services should be kept in-house vs. which ones are OK to migrate to public cloud?
KR: While most services are good candidates for cloud environments, few cloud vendors today can guarantee enterprise scale compliance or security of the service resources. Hence, mission-critical applications with high compliance requirements or security policies are not necessarily ideal candidates.
Other characteristics like network and storage bandwidth are also critical determinants in identifying “optimal” cloud services. For example, if a service requires high storage i/o with a locally hosted datastore (a “big data” app), then it may not be economically feasible to deliver the application on a public cloud infrastructure.
BSM: Everyone talks about the security risks of public cloud. What about the performance management risks? Are they just as great?
KR: Yes, in fact recent HP research on service management and the cloud found that performance, not security, is the biggest challenge to achieving the leap forward in IT productivity. Sixty-three percent of respondents cited application performance, followed by the inability to monitor service level agreements as their primary concerns as they move applications to the cloud.
With regard to performance, when resources are deployed remotely it is essential that businesses know whether providers are complying with infrastructure service levels, whether there is sufficient capacity and scalability to serve all end-users — now and as the business grows — and whether the quality of the end-user experience is sufficient to promote customer satisfaction and loyalty.
However, performance is not the only business risk associated with the cloud. Availability is also a primary area of concern. Businesses need to make sure business-critical cloud services are there when they need them. In order to do this, they must have visibility into service uptime and performance. It is also critical to be able to isolate problems based on application performance and to be able to perform trend analysis for business analytics and to predict and prevent failures.
Finally, as you mention, security is still a key concern for subscribers to cloud services. In order to make sure that company and customer data are safe from unauthorized access, business processes are safe from intrusion, and applications are guarded from interference, cloud subscribers have to share with their providers the responsibility for cloud-services security.
BSM: In terms of performance management, what should public cloud providers be doing that they are not doing?
KR: Enterprises will only move their mission-critical applications to the cloud if the vendors can raise the confidence in the cloud delivery model. To improve the performance SLAs of cloud services and to provide good visibility to their customers, public cloud vendors need to address the following issues:
Granular performance and SLA metrics: Whether they are delivering infrastructure, applications or SaaS services, public cloud vendors need to expose detailed performance data to their customers, so that customers can build trust in the performance of the resource that they utilize.
Data analysis capabilities: The performance data should be exposed in an easy to integrate manner, so that customers can analyze and report this data.
Standards-based performance metrics: Enterprise IT is evolving to a service broker model, whereby IT can use varied delivery mechanisms; public, private, outsourced or hybrid, to deliver IT services to its business customers. Enterprise IT needs a consolidated solution for all IT operations; hence public cloud providers need to provide standard metrics so that enterprise IT can quickly create the single pane of glass for all its IT service operations.
BSM: How does private cloud contribute to IT sprawl, and does this pose a problem for BSM?
KR: Virtualization is at the heart of cloud deployments. While most enterprises have seen significant ROI by the virtualization of their infrastructure, many are also struggling with the hidden cost of virtualization sprawl. Virtualization sprawl typically occurs wherein virtual machines get provisioned without adequate process control. Virtualization sprawl can lead to many unintended IT operations issues (undiscovered machines and assets, poor application performance, non-optimal configuration management etc.); virtualization sprawl also creates compliance challenges for the enterprise.
While private clouds are intended to reign in virtual machine (VM) sprawl, by streamlining the process of creating VMs -- inadequate control and management of private clouds can also exacerbate VM sprawl issues.
The primary challenge with uncontrolled/unmanaged VMs is that they are not configured or monitored appropriately to ensure the performance of the resources. Hence, services that depend on these resources don’t reflect these resources in their service maps; when a problem occurs these mis-configured resources are not immediately attended to fix the problem and the SLAs of the application suffers. To manage the service SLAs optimally, it is imperative that adequate process controls are implemented to manage all the resource that are supporting the application.
BSM: Can mapping of services to IT assets be fully automated or does it always require manual intervention?
KR: A service map – a topological description of the IT services and the IT assets that it supports – is critical for BSM. The map provides the ability for an IT operations team to quickly do impact analysis and evaluate what mission critical services are affected by operational issues with a particular infrastructure or application; the map also enables IT operations to do focused troubleshooting.
While the manual creation of service maps is acceptable in a static environment, to support the hyper-dynamic cloud environment it is imperative that the maps be automatically created and updated in near-real time.
Today, there are some solutions that automatically create and update service maps in near-real time. Many automatically discover all infrastructure (server, network, storage) and applications (database, web server, app server, middleware). These discovery solutions also automatically map the dependencies between these applications and infrastructure. More importantly as changes are made to the environment (i.e. virtual machines are moved, network elements are added and reconfigured), these changes are automatically reflected in the service maps.
BSM: What is the most important BSM advice that you can give to an organization migrating to the cloud?
KR: The most important piece of advice to businesses migrating to the cloud, in respect to BSM, is to place the emphasis on the management of services in a unified manner regardless of where those services originate.
The prevailing approach to IT operations, unfortunately, continues to focus on the performance of infrastructure components rather than the applications and services they support; an approach that depends largely on manual processes and the use of non-integrated tool sets that impede cross-silo collaboration and create huge inefficiencies among teams responsible for IT operations. In a world of cloud computing where infrastructure is constantly changing this only exacerbates the operational challenges that IT faces today.
To serve the business more effectively in an era of cloud computing, IT needs to manage every element of the IT operations, spanning physical and virtual infrastructures. This is the only way IT can live up to the increasingly demanding and evolving business needs.
Distributed tracing has been growing in popularity as a primary tool for investigating performance issues in microservices systems. Our recent DevOps Pulse survey shows a 38% increase year-over-year in organizations' tracing use. Furthermore, 64% of those respondents who are not yet using tracing indicated plans to adopt it in the next two years ...
Businesses are embracing artificial intelligence (AI) technologies to improve network performance and security, according to a new State of AIOps Study, conducted by ZK Research and Masergy ...
What may have appeared to be a stopgap solution in the spring of 2020 is now clearly our new workplace reality: It's impossible to walk back so many of the developments in workflow we've seen since then. The question is no longer when we'll all get back to the office, but how the companies that are lagging in their technological ability to facilitate remote work can catch up ...
The pandemic accelerated organizations' journey to the cloud to enable agile, on-demand, flexible access to resources, helping them align with a digital business's dynamic needs. We heard from many of our customers at the start of lockdown last year, saying they had to shift to a remote work environment, seemingly overnight, and this effort was heavily cloud-reliant. However, blindly forging ahead can backfire ...
SmartBear recently released the results of its 2021 State of Software Quality | Testing survey. I doubt you'll be surprised to hear that a "lack of time" was reported as the number one challenge to doing more testing, especially as release frequencies continue to increase. However, it was disheartening to see that a lack of time was also the number one response when we asked people to identify the biggest blocker to professional development ...