Performance monitoring is about understanding what's happening right now. It usually includes dealing with immediate performance problems or collecting data that will be used by the other performance tools (such as capacity planning) to plan for future peak loads.
In performance monitoring you need to know three things:
- The incoming workload
- The resulting resource consumption
- What is normal under this load
Without these three things you can only solve the most obvious performance problems and have to rely on tools outside the scientific realm (such as a Ouija Board, or a Magic 8 Ball) to predict the future.
You need to know the incoming workload (what the users are asking your system to do) because all computers run just fine under no load. Performance problems crop up as the load goes up. These performance problems come in two basic flavors: Expected and Unexpected.
Expected problems are when the users are simply asking the application for more things per second than it can do. You see this during an expected peak in demand like the biggest shopping day of the year. Expected problems are no fun, but they can be foreseen and, depending on the situation, your response might be to endure them, because money is tight or because the fix might introduce too much risk.
Unexpected problems are when the incoming workload should be well within the capabilities of the application, but something is wrong and either the end-user performance is bad or some performance meter makes no sense. Unexpected problems cause much unpleasantness and demand rapid diagnosis and repair.
Know What is Normal
The key to all performance work is to know what is normal. Let me illustrate that with a trip to the grocery store.
One day I was buying three potatoes and an onion for a soup I was making. The new kid behind the cash register looked at me and said: “That will be $22.50.” What surprised me was the total lack of internal error checking at this outrageous price (in 2012) for three potatoes and an onion. This could be a simple case of them not caring about doing a good job, but my more charitable assessment is that he had no idea what “normal” was, so everything the register told him had to be taken at face value. Don't be like that kid.
On any given day you, as the performance person, should be able to have a fairly good idea of how much work the users are asking the system to do and what the major performance meters are showing. If you have a good sense of what is normal for your situation, then any abnormality will jump right out at you in the same way you notice subtle changes in a loved one that a stranger would miss. This can save your bacon because if you spot the unexpected utilization before the peak occurs, then you have time to find and fix the problem before the system comes under a peak load.
There are some challenges in getting this data. For example:
- There is no workload data.
- The only workload data available (ex: per day transaction volume) is at too low a resolution to be any good for rapid performance changes.
- The workload is made of many different transaction types (buy, sell, etc.) It's not clear what to meter.
With rare exception I've found the lack of easily available workload information to be the single best predictor of how bad the overall situation is performance wise. Over the years as I visited company after company this led me to develop Bob's First Rule of Performance Work: “The less a company knows about the work their system did in the last five minutes, the more deeply screwed up they are.”
What meters should you collect? Meters fall into big categories. There are utilization meters that tell you how busy a resource is, there are count meters that count interesting events (some good, some bad), and there are duration meters that tell you how long something took. As the commemorative plate infomercial says: “Collect them all!” Please don't wait for perfection. Start somewhere, collect something and, as you explore and discover, add newly discovered meters to your collection.
When should you run the meters? Your meters should be running all the time (like bank security cameras) so that when weird things happen you have a multitude of clues to look at. You will want to search this data by time (What happened at 10:30?), so be sure to include timestamps.
The data you collect can also be used to predict the future with tools like: Capacity Planning, Load Testing, and Modeling.
ABOUT Bob Wescott
Bob Wescott is the author of The Every Computer Performance Book. Since 1987, Wescott has worked in the field of computer performance, doing professional services work and teaching how to do capacity planning, load testing, simulation modeling and web performance for Gomez/Compuware, HyPerformix/CA and Stratus Computer/Technologies. Now, Wescott is mostly retired, and his job is to give back what he has been given. His latest project is The Every Computer Performance Blog based on the book.
Three-quarters of organizations surveyed by Gartner increased customer experience (CX) technology investments in 2018 ...
Users today expect a more consumer-like experience and many self-service web sites are too focused on automating the submission of tickets and presenting long, technically written knowledge articles with little to no focus on UX. Understanding the need for a more modern experience, a newer concept called "self-help" now dominates the conversation in its ability to provide a more deliberate knowledge experience approach that better engages the user and dramatically improves the odds of them finding an answer ...
Establishing a digital business is top-of-mind, even more so than last year, as 91% of organizations have adopted or have plans to adopt a digital-first strategy, according to IDG Communications Digital Business Research ...
If digital transformation is to succeed at the pace enterprises demand, IT teams, the CIOs who lead them, and the boardroom must forge a far greater alignment than presently exists. That is the over-arching sentiment expressed by IT professionals in a recent survey on the state of IT infrastructure and roadblocks to digital success ...
Given the incredible amount of traffic traversing corporate WANs, it's not surprising that businesses are seeing performance issues. If anything, it's amazing applications work as well as they do ...
Are your business applications sluggish? Choppy? Prone to getting hung up or crashing at the most inopportune times? If these symptoms sound familiar, you might be suffering from the heartache of … poor application performance. Stop me if any of this sounds familiar ...
As network transformation initiatives like SD-WAN, edge computing and public/private clouds are adopted at increasing rates, hybrid networks are quickly becoming the new normal for IT and NetOps professionals.Without visibility into these hybrid network environments, NetOps are unable to troubleshoot the business-critical applications every organization relies on today. Here are four ways IT and NetOps teams can gain better visibility into complex, hybrid networks ...
A minimum Internet Performance Bar exists that, if met, should deliver top-tier website performance, regardless of industry, according to the 2019 Digital Experience Performance Benchmark Report, from ThousandEyes, a comparative analysis of web, infrastructure and network performance metrics from the top 20 US digital retail, travel and media websites ...
Since digital transformation is happening at such a rapid pace based on new, highly complex technologies like multi-cloud, containers and microservice architectures, customers are experiencing more challenges than ever in managing this complexity. However, with every challenge comes an opportunity. So, how can channel partners leverage these market disruptions to open the door to opportunity? The answer is simple ...