APMdigest asked experts from across the IT industry for their opinions on what IT departments should be monitoring to ensure digital performance. Part 3 covers the development side.
Code-level issues are a common cause of application slowness and have fueled the need for distributed transaction tracing, which can help isolate the exact line of code with errors. This type of monitoring can also be effectively applied in both pre- and post-production environments, enabling us to prevent performance issues before they impact end users as well as help isolate them when they do occur.
When this type of application monitoring is done in context of infrastructure dependencies, it helps distinguish if there are other issues affecting application code processing, such as a bottleneck in the application server, long-running database queries, slow third-party calls, or other issues that may be associated with the application ecosystem. Applications are the heart of IT workloads, and application performance monitoring is critical to effectively ensure the performance of digital services.
Director, Product Marketing, eG Innovations
Digital performance is complex and can be measured in many ways, but one critical consideration is how well does the application do what it is supposed to do? Is it meeting a functional performance metric for customer expectations? To ensure this, organizations need to look at the "fingerprint" of each error in code to discern its importance as well as look at the number of critical errors per release. This dictates the overall functional reliability of the code. It also requires you to be code-aware, monitoring from inside the application at runtime, not surrounding it or listening to the exhaust.
CTO and Co-Founder, OverOps
Most people already know to monitor the obvious things, like total latency to response. But my favorite monitor comes from Anatoly Mikhaylov's talk at DASH this year. He spoke about finding massive infrastructure costs hidden in error codes. Adding APM monitoring to the errors in your endpoints can show costs you wouldn't otherwise see.
APM Developer Advocate, Datadog
When automating you application release, it's important to remember what you need to monitor. This will allow you to go as fast as possible, but also make sure you are doing it efficiently. Monitor your lead time, success vs failure rate and mean time to recovery will ensure you focus on value rather than on effort.
Co-Founder and CTO, DBmaestro
One key area to make sure you monitor: API calls. There aren't many applications I come across these days that do not include some 3rd-party API, be it for authentication, analytics, storage, or customer relationship management. Such API calls can so greatly impact digital performance that not monitoring them to identify things such as performance slowdowns and dependencies is a prescription for pain.
Senior Consultant and Founder of RootPerformance
Cloud, containers and microservices are creating increasingly ephemeral, modular and volatile IT environments. In these dynamic environments, traditional monitoring approaches fail. A modern monitoring approach is required to provide complete visibility into the applications, containers, host and underlying supporting infrastructure. This includes having visibility into the performance of and data returning from APIs which have become a key component to any microservices architecture. A modern monitoring approach includes the analytics and intelligence to understand how changes might impact the overall user experience and flexible monitoring techniques that don't overload the containerized application environment.
Director, Product Marketing, CA Technologies
Finding a tool that fits seamlessly into your workflows, setting performance benchmarks, validating payloads, and getting visibility into the performance of API transactions is critical to help teams get rapidly identify and fix issues in production so that the delivered digital experience matches the vision for end-users.
VP of Product, AlertSite UXM, SmartBear
APIs are the fundamental building block of modern software. While engineering teams have built extensive monitoring systems to check the health of code execution paths, they have little visibility into what's going on with APIs. An API failure can bring down systems and without proper monitoring in place, it can be very hard to debug what's going on.
The nature of development means systems are going to spring into existence and then back out again often, and that this rapid change is OK, which means your monitoring needs to be OK with it. The ability to monitor containers, ephemeral services, and the like, is a must.
Head Geek, SolarWinds
Let's go to the extreme and say you could only monitor one thing — that one thing would be microservice response time. In this brave new world, it's actually quite difficult to understand how well your revenue-critical application is performing. While traditional metrics still matter (CPU, memory, disk, etc), your response time on a microservice-by-microservice basis is the thing that matters the most. This single metric will tell you more about the customer experience than anything else. It will indicate downtime or more subtle performance problems in your application. While this metric alone will not tell you "why" something is going on, it will tell you "what" is happening and allow you to quickly isolate a problem to a handful of services or some set of underlying infrastructure.
As you evolve and enhance your company's hybrid data center infrastructure to keep pace with your industry, understanding your unique workload I/O DNA is paramount to success. Real-time monitoring of the I/O path – from the virtual server to the storage array – is essential to ensuring digital performance. For mission-critical applications, understanding the performance of each and every transaction is the cornerstone of customer satisfaction and revenue assurance.
CMO, Virtual Instruments
Read Len Rosenthal's new blog on APMdigest: Infrastructure Monitoring for Digital Performance Assurance.
Read What You Should Be Monitoring to Ensure Digital Performance - Part 4, covering the infrastructure, including the cloud and the network.
A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report ...
As the Covid-19 pandemic forces a global reset of how we gather and work, 60% of organizations are looking forward to increased spending in 2021 to deploy new technologies, according to the 14th annual State of the Network global study of enterprise networking and security challenges released by VIAVI Solutions ...
Complexity breaks correlation. Intelligence brings cohesion. This simple principle is what makes real-time asset intelligence a must-have for AIOps that is meant to diffuse complexity. To further create a context for the user, it is critical to understand service dependencies and correlate alerts across the stack to resolve incidents ...
We're all familiar with the process of QA within the software development cycle. Developers build a product and send it to QA engineers, who test and bless it before pushing it into the world. After release, a different team of SREs with their own toolset then monitor for issues and bugs. Now, a new level of customer expectations for speed and reliability have pushed businesses further toward delivering rapid product iterations and innovations to keep up with customer demands. This leaves little time to run the traditional development process ...
On Wednesday January 27, 2021, Microsoft Office 365 experienced an outage affected a number of its services with a prolonged outage affecting Exchange Online. Despite Microsoft indicating that it was just Exchange Online affected during this outage, some monitoring tools detected that Azure Active Directory and dependent services like SharePoint and OneDrive were also affected at the time. The outage information indicated a rollout and rollback but we wouldn't expect to see such a widescale outage and slowdown just affecting some of the schema unless everything had to be taken offline ...
Application availability depends on the availability of other elements in a system, for example, network, server, operating system and so on, which support the application. Concentrating solely on the availability of any one block will not produce optimum availability of the application for the end user ...
A hybrid work environment will persist after the pandemic recedes, with over 80% stating that they expect over a quarter of workers to remain remote, and over two-thirds desiring flexibility between on-premises and remote deployments according to the 2021 State of the WAN report released by Aryaka ...
As vaccinations rise and businesses plan for a post-covid future, more than 80% of knowledge workers in the US would like their long-term work environment to include some element of remote work ...
With so many of us working from home, IT leaders and executives are now more than ever interested in ensuring that the cloud services their team relies on are available. But instead of accessing popular business-critical applications such as Salesforce, G Suite, Office 365, Microsoft 365, and so on through the company's data center, employees now get these services directly from the Internet. Experience and productivity at each location vary by internet, ISP, gateway, proxy, etc. ...
Integration challenges continue to be a major roadblock for digital transformation initiatives, according to MuleSoft’s 2021 Connectivity Benchmark Report. As digital initiatives accelerate, integration has emerged as a critical factor in determining the success and speed of digital transformation across industries ...