In BSMdigest’s exclusive interview, Vikas Aggarwal, founder and CEO of Zyrion, discusses Business Service Management in the cloud, and new BSM technologies and approaches for the modern datacenter.
BSM: What do you see as the main monitoring challenges of private cloud?
VA: Within a private cloud environment, the monitoring approach for applications and services has to account for inter-dependencies and impacts of the shared and virtual infrastructure. A non-cloud infrastructure has applications housed within discrete servers that are connected to the network, and the contention of resources outside of the physical server is limited primarily to the network. Within a private cloud though, applications share the same underlying physical resources and one application can impact performance of a totally unrelated application or service just because of the virtual infrastructure. There are a lot more dimensions that can impact a service, all of which need to be accounted for by the monitoring solution.
BSM: How can technology provide a business-oriented view of the cloud computing infrastructure?
VA: Performance management and monitoring technology has to enable mapping the different components of the cloud to the supported business services. The monitoring approach starts by first looking at the performance and availability of the business services, and then the underlying components within the cloud computing infrastructure. Creating the mapping between cloud components and business services is easier to do within private clouds, but will require well-documented and rich APIs for public clouds.
BSM: Does cloud make BSM a requirement, and if so, why?
VA: Traditional approaches to performance monitoring focus on individual nodes and components in the IT infrastructure, while cloud infrastructure is a shared resource and individual performance indicators in isolation are meaningless. Furthermore, one might not even have access to individual metrics in public clouds. Focusing on the performance of services instead and correlating all the underlying components of the service is the only way IT can support the business.
BSM: How does IT get aligned with the company's business goals? Is it an issue of corporate culture?
VA: Senior managers tend to understand the value of service-oriented IT monitoring. In our customer survey, over 80% of our customers use the BSM features in our product, and in almost all cases, senior managers were using the BSM technology and dashboards on a regular basis. So, it has to be driven from the top and has to be come a part of the corporate culture.
BSM: How do you define "real-time visibility"?
VA: Real-time visibility means having real-time or near-real time metrics and data on the availability and performance of business services and the underlying IT infrastructure. Real-time means having dashboards where you are immediately alerted if a business service is performing poorly because of the underlying IT infrastructure. Real-time means being able to instantly drill-down from a BSM dashboard to the packet flow and isolating the root cause impacting a business service. Real-time means no swivel chair management, no waiting for another group to respond and provide answers, no waiting for the database to get the data in the night to churn out a report – all the answers and data is available right away for everyone when they need it.
BSM: What are Business Service Containers?
VA: Zyrion’s Business Service Containers are flexible, automated objects which represent Business Services in an organization. They allow an organization to create logical, business-oriented views of the overall physical and virtualized computing network. You can define different SLAs for different containers, create fault-tolerant redundant models within a container, have nested containers with cascading alarms or create containers that include tests and containers owned by other departments. Our Business Service Containers allow different departments and users to create views of the IT infrastructure that align with their roles with full flexibility and access control that is essential for adoption within the enterprise. Most importantly, our Business Container model is overlaid on top of our topology discovery model to reduce alarm floods and very rapid root cause discovery of Business Service downtime.
BSM: Why is configuration management important to BSM?
VA: Configuration management enables backup, restore and tracking of changes in network device configurations across the enterprise network. Proper tracking and notification of configuration changes in the network prevents unexpected outages, as well as helps to correlate undesired changes in network behavior with recent configuration changes. Having configuration management integrated with BSM is important because IT administrators can correlate network outages to configuration changes and understand the corresponding impact on dependent business services.
BSM: What is the advantage of having a distributed data collection and database instead of a centralized data?
VA: Zyrion’s Traverse has a unique, patented architecture where all the data is collected and stored in distributed databases – there is no centralized data warehouse, unlike other products. Our business correlation engine presents a unified view across this distributed database in real-time.
Having a distributed database and collection architecture allows the solution to scale to very large environments not possible with earlier generation products. In order to provide a unified BSM solution, the platform has to be able to collect data from the network, server and applications and correlate, analyze and present it in real time. A large segment of our customer base switched from the other products because a centralized data warehouse model did not scale. We have customers monitoring their IT infrastructure with over 10,000 servers and routers in multiple datacenters and close to a million metrics every 5 minutes, and not requiring a single dedicated engineer to maintain the solution. As customers demand full visibility into their Business and IT services, having a real-time scalable system is a must, and having a distributed database and data collection approach is key to handling the demands of the new IT datacenter.
BSM: Do different stakeholders need different dashboards that speak to their needs?
VA: Yes, that is essential. Within an integrated BSM environment, with technicians, managers and business owners as users, information needs to be presented in a way that is relevant to the user roles within the organization. We have customers with over 200 active users of our product ranging from CxOs, product managers, database administrators, IT architects and NOC staff. While the CIO is only interested in the status of the key business operation IT containers, the product managers have dashboards to view response time, number of users, transactions and key applications relevant to their products. The database, server and network architects use the performance data for future planning, while the NOC needs the event driven dashboard to see what problems exist within the network.
Even the reports generated by each group are different – the product managers need reports on online user growth and response time, while the database manager needs trend reports on transactions per second, and the IT operations manager needs uptime SLA reports. An alert on the server administrator dashboard might not show on the product manager dashboard because of a fault-tolerant architecture or redundant network paths. Providing role-specific views becomes even more relevant when using private or publiccloud environments.
BSM: How was Zyrion created?
VA: Zyrion is a spin out of a public company, focused on correlating the impact of IT infrastructure on Business Services. We were the first company to integrate packet and flow analysis with BSM, and hence reduce the downtime of Business Services by enabling quicker resolution.
About Vikas Aggarwal
Vikas Aggarwal is founder and CEO of Zyrion Inc., a provider of BSM & IT infrastructure monitoring software solutions. Vikas Aggarwal has been an entrepreneur and senior executive at multiple technology startups over the past 20 years. He was the founder and CEO of Fidelia, a venture-backed IT infrastructure management software company, where he led the company's growth to about 100 customers before their acquisition by Network General. At Network General, he was the VP of Product Management where he oversaw product strategy through their acquisition by Netscout in late 2007.
Distributed tracing has been growing in popularity as a primary tool for investigating performance issues in microservices systems. Our recent DevOps Pulse survey shows a 38% increase year-over-year in organizations' tracing use. Furthermore, 64% of those respondents who are not yet using tracing indicated plans to adopt it in the next two years ...
Businesses are embracing artificial intelligence (AI) technologies to improve network performance and security, according to a new State of AIOps Study, conducted by ZK Research and Masergy ...
What may have appeared to be a stopgap solution in the spring of 2020 is now clearly our new workplace reality: It's impossible to walk back so many of the developments in workflow we've seen since then. The question is no longer when we'll all get back to the office, but how the companies that are lagging in their technological ability to facilitate remote work can catch up ...
The pandemic accelerated organizations' journey to the cloud to enable agile, on-demand, flexible access to resources, helping them align with a digital business's dynamic needs. We heard from many of our customers at the start of lockdown last year, saying they had to shift to a remote work environment, seemingly overnight, and this effort was heavily cloud-reliant. However, blindly forging ahead can backfire ...
SmartBear recently released the results of its 2021 State of Software Quality | Testing survey. I doubt you'll be surprised to hear that a "lack of time" was reported as the number one challenge to doing more testing, especially as release frequencies continue to increase. However, it was disheartening to see that a lack of time was also the number one response when we asked people to identify the biggest blocker to professional development ...
The role of the CIO is evolving with an increased focus on unlocking customer connections through service innovation, according to the 2021 Global CIO Survey. The study reveals the shift in the role of the CIO with the majority of CIO respondents stating innovation, operational efficiency, and customer experience as their top priorities ...
The perception of IT support has dramatically improved thanks to the successful response of service desks to the pandemic, lockdowns and working from home, according to new research from the Service Desk Institute (SDI), sponsored by Sunrise Software ...
Is your company trying to use artificial intelligence (AI) for business purposes like sales and marketing, finance or customer experience? If not, why not? If so, has it struggled to start AI projects and get them to work effectively? ...
As remote work persists, and organizations take advantage of hire-from-anywhere models — in addition to facing other challenges like extreme weather events — companies across industries are continuing to re-evaluate the effectiveness of their tech stack. Today's increasingly distributed workforce has put a much greater emphasis on network availability across more endpoints as well as increased the bandwidth required for voice and video. For many, this has posed the question of whether to switch to a new network monitoring system ...
When a website or app fails or falters, the standard operating procedure is to assemble a sizable team to quickly "divide and conquer" to find a solution. The details of the problem can usually be found somewhere among millions of log events and metrics, leading to slow and painstaking searches that can take hours and often involve handoffs between experts in different areas of the software. The immediate goal in these situations is not to be comprehensive, but rather to troubleshoot until you find a solution that remedies the symptom, even if the underlying root cause is not addressed ...