APMdigest asked experts from across the Application Performance Management (APM) industry for their opinions on how to best prepare for the challenges of Black Friday, Cyber Monday, and the Holiday Shopping Season. The final set of six technologies presented here includes tools to manage availability, capacity and resources.
13. Service Availability Management
Given the scale of enterprise IT environments and the high frequency of change – especially when retailers are continuously revamping their web and mobile storefronts for the holiday season – it becomes increasingly challenging to meet application and service availability requirements. Unfortunately, downtime is simply not an option as for many retailers, performance over the next few weeks will determine whether they meet their financial expectations for the year or fail to do so. Service Availability Management (SAM) software is the ideal solution, as it serves as a common platform for all relevant IT teams to collaborate and proactively find and correct vulnerabilities before they impact business operation – because as they say, an ounce of prevention is worth a pound of cure.
CTO, Continuity Software
14. Complex Operations Event Processing
In parallel to doing what you can to prevent failure under stress of the upcoming holiday shopping season, also plan for failure. If business is good, and the stress on applications and systems is high, then there is a higher chance that failure will occur, no matter what preventative measures you put in place. The focus should then be on tools that can give you a "first responder's view," i.e. the ability to provide early warning of application service-affecting situations as they happen, so you can start the remediation process earlier and restore the application service sooner. To do this well, the tool needs to pull monitoring streams from across the entire IT stack and then correlate application performance with the underlying cloud and data center infrastructure. The tool should be agile and automated, relying on algorithmic event processing in real-time, and not on static rules or models. Finally, add socialized workflow on top of the algorithmic incident detection, to enable a collaborative remediation process that will reduce mean time to restore (MTTR), enabling siloed experts to work together, viewing and sharing the contextualized situation awareness.
Chairman, CEO and Co-Founder, Moogsoft
15. Capacity Planning and Optimization
For digital businesses to survive and thrive during the holiday season, they need to do more than over provision capacity and hope for the best. They need to accurately predict the amount of stress that will be placed on their consumer facing service and plan accordingly. This requires a clear understanding of historical and current resource consumption as well as the ability to account for other services when analyzing the demands of the holiday season. Investing in an advanced capacity optimization solution with strong analytics that allows services to be easily expanded or contracted based on market conditions is a critical part of ensuring consistent service delivery.
President of the Performance and Availability Business at BMC Software
Find your bottlenecks before they find you. Stress test or conduct a "what if" analysis on your IT systems to not only know what the first resource bottleneck you will hit, but also the second and third bottlenecks lurking right behind. Once you know where your problems will likely come from, have a plan to redistribute or add capacity quickly if needed. A virtualization monitoring and capacity planning tool with a what-if analysis modeling capability can help accomplish much of this.
Director, Systems Management Product Marketing, SolarWinds
During the holiday rush, the best advice is to utilize capacity planning and harness the clouds. With capacity planning, start with historical data to obtain a baseline. Now estimate how and why that will change this year. Reach out for on-demand capacity from a public or private cloud provider and run synthetic transactions using your APM tools to ensure your infrastructure will not fold under pressure. The best gift is to give your customers the consistent experience that they have to come to expect from you.
Director, Product Marketing, Zenoss
16. The Cloud and Scalable Resources
Consider using a cloud platform provider to supplement internal infrastructure during peak traffic periods.
Technology Analyst, TechTonics
In order to handle dramatic spikes in usage, companies should leverage an IaaS/PaaS environment for temporary application scale. In order to monitor an application partially hosted on IaaS/PaaS and partially in the data center, companies need hybrid APM tools that can monitor across the board from a unified view. The tool should be able to isolate and alert on bottlenecks across the hybrid application stack.
Sr. Product Manager - APM, IBM
The worldwide business-to-consumer e-commerce sales is expected to reach $1.5 trillion in 2014, presenting major revenue opportunities for retailers. If not adequately prepared, business are not only likely to miss out on sales on Cyber Monday, but lost customer loyalty and potential repeat sales as well. In this scenario, even a minute of website downtime or lag time is not acceptable. With a scale-out database, businesses can leverage the cloud and easily absorb new customers for increased transaction volume. Additionally, a scale-out database provides more uptime, more capacity and therefore more revenue opportunities for e-commerce merchants, without the worry of the backend infrastructure.
To prepare your applications for the holidays, it's important to make sure they're ready for the increased load. One way to do this is to evaluate your application monitoring platform to ensure it has the resources it needs to dynamically shift load based on project and application demands, and provide a dedicated environment for allocated resources. From there, you can efficiently distribute resources to the necessary applications and dynamically scale your app to project needs.
Founder of Modulus, a Progress company
17. Fast and Reliable Storage
From Black Friday to Cyber Monday to the rest of the whirlwind holiday shopping season, retailers depend on these sales to keep them in the black. Months of strategic planning takes place, in the hopes of winning the highly competitive race for savvy consumers. Imagine if after all of that work, your website went down or your registers stopped operating – how long would customers hang-around? Business organizations that wish to be successful this holiday season must have a bullet-proof IT infrastructure in place, the foundation of which needs to be lightning fast, highly available, and completely reliable data storage.
18. Business Continuity Solutions
Black Friday, Cyber Monday and the ensuing onslaught of shrewd shoppers seeking the best deals are right around the corner. For most retailers, these sales can account for as much as 40% of their annual revenue. Imagine the cost if their systems go down for minutes, hours or even a day? With so much on the line, protecting and optimizing database infrastructure must become a top priority. The most efficient and cost effective means by which to do this is to deploy a solution that enables workload portability across your database infrastructure – i.e., makes workloads easier to manage, distribute, share and protect. In doing so, when the stampede of shoppers hit the website or store, IT can rest assured that their SLAs for performance and availability will be met.
CEO and Co-Founder, DH2i
Always-on access to applications and information is a 365 days, year round job. But during the holiday season, when your nearest competitor is just a click away, it's even more critical to keep the lights on. Since the occasional downtime is bound to happen with the stress of the holiday online shopping season, knowing you have an online backup and disaster recovery solution with WAN-optimized technologies that enable you to immediately get back to business, can make the difference between ringing in the New Year with higher profits, or going bust.
VP of Products, Zetta.net
A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report ...
As the Covid-19 pandemic forces a global reset of how we gather and work, 60% of organizations are looking forward to increased spending in 2021 to deploy new technologies, according to the 14th annual State of the Network global study of enterprise networking and security challenges released by VIAVI Solutions ...
Complexity breaks correlation. Intelligence brings cohesion. This simple principle is what makes real-time asset intelligence a must-have for AIOps that is meant to diffuse complexity. To further create a context for the user, it is critical to understand service dependencies and correlate alerts across the stack to resolve incidents ...
We're all familiar with the process of QA within the software development cycle. Developers build a product and send it to QA engineers, who test and bless it before pushing it into the world. After release, a different team of SREs with their own toolset then monitor for issues and bugs. Now, a new level of customer expectations for speed and reliability have pushed businesses further toward delivering rapid product iterations and innovations to keep up with customer demands. This leaves little time to run the traditional development process ...
On Wednesday January 27, 2021, Microsoft Office 365 experienced an outage affected a number of its services with a prolonged outage affecting Exchange Online. Despite Microsoft indicating that it was just Exchange Online affected during this outage, some monitoring tools detected that Azure Active Directory and dependent services like SharePoint and OneDrive were also affected at the time. The outage information indicated a rollout and rollback but we wouldn't expect to see such a widescale outage and slowdown just affecting some of the schema unless everything had to be taken offline ...