This blog is an excerpt from DevOps, DBAs, and DBaaS by Mike Cuppet.
Yes, There Really Is a Problem
It is not that we do not believe user-reported information; it is just that experience tells us that other factors can be in play that make it necessary to get the full representation of the problem. One user would complain several times a week about application slowness, which was causing the person's performance metrics to drop. Upon investigation using a packet capture tool, it was determined that the live video streaming to the user's computer was causing the application slowness. This person was advised to stop the streaming and given the heads up that the company could "see" everything. Nothing illegal was happening, but complaining about self-inflicted impaired performance caused by news/entertainment traffic does not boost careers if that information is shared.
Continuing with our hypothetical problem: the user-side investigations recorded slowness consistently in the 5–17 seconds range, with very few outliers, which narrows the actual slowness impact significantly. If you are lucky, the captures you already have point to a single call that represents the majority of the slowness, allowing immediate focus on what is likely the root cause.
As member of a DevOps IT shop, you know that software releases occur nightly. Unfortunately, the users did not report the problem immediately, making it difficult to establish when the problem was introduced, (except that everything seemed to be good a few weeks ago; and, by the way, the problem occurs at different times of the day; otherwise, performance is acceptable). The release report shows at least five changes that may have impacted this functionality: four were implemented successfully, and one had to be rolled back with no root cause documented. Here, the binary release check has failed the organization. Release success or failure does not communicate information needed by the business or IT. Code that is successfully deployed with functionality validated by a tester does not tell the entire story (for example, performance degradation being introduced). DevOps testing purposely initiates more comprehensive answers. Excessive testing vets the software thoroughly and automatically, making it feasible to include tests designed to measure performance. It gives the green light only on performance that matches or is faster than a predefined value or the previous code version timing.
As DevOps teams "shift-left" and work in conjunction with business leaders as product managers, IT (now DevOps) truly becomes partners with the business. The "IT alignment to the business" goal included in the annual IT strategy deck for the last decade becomes obsolete. The perceived (or actual) misalignment was not only because the business teams did not understand what IT really did, other than spending offensively huge chunks of money to drive business operations, IT also wholly failed to come to the table as a business partner; instead remaining aloof and detached from everything but technology.
Thirty years ago, IT, MIS, or data processing (whatever the name) was given the mission of finding ways to complete work faster than teams of people could by having computers do mundane, repeatable tasks. Ironically, DevOps in many ways reaches back 40 years to repeat the tactical execution of having computers do mundane tasks: repetitive code testing, deployments, infrastructure as code, and more. Between then and now, far too many manual steps were added to processes that now need to be remediated. Forty years ago, computer work likely resulted in teams of people losing their jobs, but DevOps does not have the same mandate as in the data processing years. Instead, highly skilled engineers and programmers are freed from repetitive tasks and allowed to partner with the business to generate and implement game-changing technologies and applications.
DevOps wants and needs to shift talented, intelligent, experienced staff into roles that deliver measurable benefits for the company. Repeatable tasks can be done much faster by computers, but computers do not generate ideas. Computers running data analytics programs churn through data millions of times faster than humans, but computers still do not have the capability to find answers in the data, interpret the data, or act on the data like people do. People assimilate varying data points to produce value in new ways. DevOps needs people to create opportunities to help the business leapfrog competitors.
It is not intended to get rid of people; instead, it wants to make people more effective and focused on executing business strategies, not hampered by mundane tasks. Accomplishments have moved from "Designed a new algorithm for . . ." to "Improved customer experience . . . reduced costs . . . implemented a new revenue channel . . ."
DBAs and DevOps teams should take a positive stance and attitude toward the goals of Agile and DevOps, knowing that each person's impact on the organization can make tremendous strides to create better customer experiences and software products, and continually improve business processes, all with prospective top- and bottom-line impacts.
Change management analysis in DevOps extends beyond binary conclusions to business impact statements. Reporting successful or failed statuses alone shifts to informative, customer-centric statuses such as the following:
• "Change 123 implementing function A successfully reduced execution time 40%; now averaging 7 milliseconds per call."
• "The change to reorganize table ABC successfully reduced report execution time, allowing the business to meet contractual requirements."
• "Change 456 failed and was rolled over successfully with change 512. Testing for change 456 did not include a critical data test; later found and tested for change 512, which allowed the failure to advance. Teams had rectified, tested, and implemented the needed test earlier this week, having change 512 already in the pipeline. The 512 push completed successfully within the change window, eliminating the risk."
DevOps' fail fast edict can really benefit the company by progressing software products continuously and without having laborious rollbacks, rework, retests, and reimplementation. In the previous third scenario, the DevOps team knows that a communication was missed because change 456 should have never made it to the release stage, let alone production.
So as change management communications pivot from mundane status updates to business impact updates, opportunities to improve application performance become more apparent. Moving from a message that the code was implemented successfully to a message that the code decreased customer query time by 67% tells a better story. There is a large chasm between code that works and code that works and executes expectantly fast while generating an audit trail. Adding a new feature that performs poorly is not really a feature — it is a bug and a frustration for customers. Adding a feature that is expected to increase mobile app usage 400% without increasing infrastructure resources is not a feature, but a colossal failure. The DevOps movement provides the needed tactical response with infrastructure as code. When traffic is expected to spike, adding resources to existing virtual hosts or spinning up additional hosts with a button click or two simplifies infrastructure readiness and resiliency.
Michael Olson on the AI+ITOPS Podcast: "I really see AIOps as being a core requirement for observability because it ... applies intelligence to your telemetry data and your incident data ... to potentially predict problems before they happen."
Enterprise ITOM and ITSM teams have been welcoming of AIOps, believing that it has the potential to deliver great value to them as their IT environments become more distributed, hybrid and complex. Not so with DevOps teams. It's safe to say they've kept AIOps at arm's length, because they don't think it's relevant nor useful for what they do. Instead, to manage the software code they develop and deploy, they've focused on observability ...
The post-pandemic environment has resulted in a major shift on where SREs will be located, with nearly 50% of SREs believing they will be working remotely post COVID-19, as compared to only 19% prior to the pandemic, according to the 2020 SRE Survey Report from Catchpoint and the DevOps Institute ...
All application traffic travels across the network. While application performance management tools can offer insight into how critical applications are functioning, they do not provide visibility into the broader network environment. In order to optimize application performance, you need a few key capabilities. Let's explore three steps that can help NetOps teams better support the critical applications upon which your business depends ...
In Episode 8, Michael Olson, Director of Product Marketing at New Relic, joins the AI+ITOPS Podcast to discuss how AIOps provides real benefits to IT teams ...
Will Cappelli on the AI+ITOPS Podcast: "I'll predict that in 5 years time, APM as we know it will have been completely mutated into an observability plus dynamic analytics capability."
When you consider that the average end-user interacts with at least 8 applications, then think about how important those applications are in the overall success of the business and how often the interface between the application and the hardware needs to be updated, it's a potential minefield for business operations. Any single update could explode in your face at any time ...
Despite the efforts in modernizing and building a robust infrastructure, IT teams routinely deal with the application, database, hardware, or software outages that can last from a few minutes to several days. These types of incidents can cause financial losses to businesses and damage its reputation ...
In Episode 7, Will Cappelli, Field CTO of Moogsoft and Former Gartner Research VP, joins the AI+ITOPS Podcast to discuss the future of APM, AIOps and Observability ...