Performance monitoring is about understanding what's happening right now. It usually includes dealing with immediate performance problems or collecting data that will be used by the other performance tools (such as capacity planning) to plan for future peak loads.
In performance monitoring you need to know three things:
- The incoming workload
- The resulting resource consumption
- What is normal under this load
Without these three things you can only solve the most obvious performance problems and have to rely on tools outside the scientific realm (such as a Ouija Board, or a Magic 8 Ball) to predict the future.
You need to know the incoming workload (what the users are asking your system to do) because all computers run just fine under no load. Performance problems crop up as the load goes up. These performance problems come in two basic flavors: Expected and Unexpected.
Expected problems are when the users are simply asking the application for more things per second than it can do. You see this during an expected peak in demand like the biggest shopping day of the year. Expected problems are no fun, but they can be foreseen and, depending on the situation, your response might be to endure them, because money is tight or because the fix might introduce too much risk.
Unexpected problems are when the incoming workload should be well within the capabilities of the application, but something is wrong and either the end-user performance is bad or some performance meter makes no sense. Unexpected problems cause much unpleasantness and demand rapid diagnosis and repair.
Know What is Normal
The key to all performance work is to know what is normal. Let me illustrate that with a trip to the grocery store.
One day I was buying three potatoes and an onion for a soup I was making. The new kid behind the cash register looked at me and said: “That will be $22.50.” What surprised me was the total lack of internal error checking at this outrageous price (in 2012) for three potatoes and an onion. This could be a simple case of them not caring about doing a good job, but my more charitable assessment is that he had no idea what “normal” was, so everything the register told him had to be taken at face value. Don't be like that kid.
On any given day you, as the performance person, should be able to have a fairly good idea of how much work the users are asking the system to do and what the major performance meters are showing. If you have a good sense of what is normal for your situation, then any abnormality will jump right out at you in the same way you notice subtle changes in a loved one that a stranger would miss. This can save your bacon because if you spot the unexpected utilization before the peak occurs, then you have time to find and fix the problem before the system comes under a peak load.
There are some challenges in getting this data. For example:
- There is no workload data.
- The only workload data available (ex: per day transaction volume) is at too low a resolution to be any good for rapid performance changes.
- The workload is made of many different transaction types (buy, sell, etc.) It's not clear what to meter.
With rare exception I've found the lack of easily available workload information to be the single best predictor of how bad the overall situation is performance wise. Over the years as I visited company after company this led me to develop Bob's First Rule of Performance Work: “The less a company knows about the work their system did in the last five minutes, the more deeply screwed up they are.”
What meters should you collect? Meters fall into big categories. There are utilization meters that tell you how busy a resource is, there are count meters that count interesting events (some good, some bad), and there are duration meters that tell you how long something took. As the commemorative plate infomercial says: “Collect them all!” Please don't wait for perfection. Start somewhere, collect something and, as you explore and discover, add newly discovered meters to your collection.
When should you run the meters? Your meters should be running all the time (like bank security cameras) so that when weird things happen you have a multitude of clues to look at. You will want to search this data by time (What happened at 10:30?), so be sure to include timestamps.
The data you collect can also be used to predict the future with tools like: Capacity Planning, Load Testing, and Modeling.
ABOUT Bob Wescott
Bob Wescott is the author of The Every Computer Performance Book. Since 1987, Wescott has worked in the field of computer performance, doing professional services work and teaching how to do capacity planning, load testing, simulation modeling and web performance for Gomez/Compuware, HyPerformix/CA and Stratus Computer/Technologies. Now, Wescott is mostly retired, and his job is to give back what he has been given. His latest project is The Every Computer Performance Blog based on the book.
Enterprises with services operating in the cloud are overspending by millions due to inefficiencies with their apps and runtime environments, according to a poll conducted by Lead to Market, and commissioned by Opsani. 69 Percent of respondents report regularly overspending on their cloud budget by 25 percent or more, leading to a loss of millions on unnecessary cloud spend ...
For IT professionals responsible for upgrading users to Windows 10, it's crunch time. End of regular support for Windows 7 is nearly here (January 14, 2020) but as many as 59% say that only a portion of their users have been migrated to Windows 10 ...
Application performance monitoring (APM) has become one of the key strategies adopted by IT teams and application owners in today’s era of digital business services. Application downtime has always been considered adverse to business productivity. But in today’s digital economy, what is becoming equally dreadful is application slowdown. When an application is slow, the end user’s experience accessing the application is negatively affected leaving a dent on the business in terms of commercial loss and brand damage ...
Useful digital transformation means altering or designing new business processes, and implementing them via the people and technology changes needed to support these new business processes ...
xMatters recently released the results of its Incident Management in the Age of Customer-Centricity research study to better understand the range of various incident management practices and how the increased focus on customer experience has caused roles across an organization to evolve. Findings highlight the ongoing challenges organizations face as they continue to introduce and rapidly evolve digital services ...
The new App Attention Index Report from AppDynamics finds that consumers are using an average 32 digital services every day — more than four times as many as they realize. What's more, their use of digital services has evolved from a conscious decision to carry around a device and use it for a specific task, to an unconscious and automated behavior — a digital reflex. So what does all this mean for the IT teams driving application performance on the backend? Bottom line: delivering seamless and world-class digital experiences is critical if businesses want to stay relevant and ensure long-term customer loyalty. Here are some key considerations for IT leaders and developers to consider ...
Through the adoption of agile technologies, financial firms can begin to use software to both operate more effectively and be faster to market with improvements for customer experiences. Making sure there is the necessary software in place to give customers frictionless everyday activities, like remote deposits, business overdraft services and wealth management, is key for a positive customer experience ...
For the past two years, Couchbase has been digging into enterprises' digital strategies. Can they deliver the experiences and services their end-users need? What pressure are they under to innovate and succeed? And what is driving investments in new technologies? ...
Adapting to new business requirements and technological shifts requires that IT Ops teams adopt a different viewpoint, and along with that, skills and culture. A survey by OpsRamp uncovered some common thinking among IT Operations leaders on how to address talent, budget, and data management pains amid digital disruption ...