Microsoft Teams Optimization for a Remote Workforce
Synthetic Teams Monitoring for Quality, Availability and Performance
March 08, 2021

Sidharth Kumar
Exoprise

Share this

What the Surge?

Microsoft Teams is everywhere. Not surprisingly, during the pandemic, the number of daily active users for Teams increased to 75 million in 2020. More and more people are WFH and companies are becoming virtual. Personal meetings are fading now, and Teams poises to become the next best collaboration tool. According to a Riverbed study, 64% of US employees are now working from home because of the Covid pandemic. In turn, Microsoft Teams optimization has become a critical topic for Operations and Network personnel.

As work shifts to a new environment setting, executives report that at least half of their distributed workforce consistently experiences poor experiences with SaaS apps they use. The phenomenal growth of Teams has resulted in video call rates increasing by over 1000% in recent months. However, outages have found a way to plague Teams causing disruptions during meetings and broadcasts. In this unprecedented era, the continued success of Teams operations relies on monitoring and managing quality metrics in real-time.


Monitoring Outside the Control Zone

Microsoft Teams optimization for the end-user experience for WFH employees requires a comprehensive strategy for measuring four critical parameters — transit quality, bandwidth, capacity, and frequent change. IT needs to quickly determine what affects users, and whether the problem lies with Teams or within their network (VPN firewall, proxy, gateways, etc.). Nevertheless, achieving this task is difficult if monitoring is outside the control zone.

Employees are geographically distributed, and endpoints differ in each home setting. Connecting to Teams or various SaaS services requires multiple networks and protocols and each connection has a different network configuration. Consequently, this presents a unique challenge for IT as they have no visibility and zero control. Yet, IT is accountable for ensuring that business-critical enterprise applications are up and running and end-users have the best experience.

Connecting to Teams Architecture

As enterprises continue to hire remote employees for diversity, it becomes critical to understand how these employee’s log into the Teams infrastructure. A typical Teams meeting has a host and several participants.

Considering this, one must evaluate success factors in terms of the locations where all the participants will be joining vs where the host would be streaming AV from.

When a meeting participant connects from India to a host meeting in the US, the participant connects to the nearest Microsoft media/relay server from their location through the nearest front door into the Azure backbone infrastructure. The host starts the video, and this traffic leads back to the participants via the same route. Before optimizing Teams performance, it becomes critical to understand baseline metrics at these sites where IT can observe the real-time quality of AV.

Even the Best Fail at Times

But can a Microsoft call quality dashboard (CQD) tool help achieve baseline goals?

CQD can provide a snapshot of data quality during the assessment phase by keeping tabs on each user, each call, and each meeting. Moreover, as networks and underlying infrastructure change, this impacts Teams experience for everyone.

New changes include additional capacity, gateways, accelerators, SD-WAN, etc. while others can be due to a human error component. CQD warns only after a change has improved or deteriorated and remote workers continue to experience issues.

What CQD lacks is collecting data from real Teams sessions in real-time from critical vantage points. In the event of an outage, call failure, or poor AV quality, the IT team needs proactive notifications. By providing instant access to hop-by-hop context data, support teams can quickly identify if the actual problem lies with Microsoft, ISP, or own network.

Gauging Teams Performance Metrics

Latency, packet loss, jitter, bandwidth, service quality, etc. are few metrics to measure when it comes to Teams. Gaining better control of these metrics can contribute to a superior experience. However, CQD displays outdated evaluation data and impairs decision making for IT who are concerned if their network can deliver the metrics consistently and not just during the assessment.

The fallout from this is expecting calls from irate customers having issues logging in or with the AV quality. There should be an easier and preferred way for IT to quickly determine whether they are meeting their call quality goals. Below are basic benchmarks for delivering an optimal Teams experience.

■ Mean (avg) Audi Jitter

■ Round Trip Latency

■ Packet Loss

■ Audio Bandwidth > 100Kbps

■ Video Bandwidth > 300 Kbps

Synthetic Monitoring to the Rescue

Synthetic monitoring delivers Teams metrics data in real-time, enabling IT to easily estimate availability, uptime, and performance. Below is a screenshot that shows exactly what readings are for these critical metrics, and their respective numbers. Companies can now confidently respond to whether they are effectively meeting remote, WFH, and branch office employee experience needs.

By using sensors at strategic vantage points that are met every few minutes, monitoring tools capture real-time end-to-end communication statistics 24*7 and 365 days a year. The number of sensors a company needs to deploy depends on how far it needs to cast a network to raise the alarm about any problems. IT can start immediately by simply deploying a few sensors in the data center, at headquarters, or for remote employees from any Windows machine on any network and immediately gain insights. If a problem happens and there are no issues reported from vantage points, then Microsoft is the main culprit.


Always Be Monitoring

Testing before changes (for baselines), after changes (to confirm if any new network upgrade improved or broke the benchmark), and continuous monitoring in production, becomes a central part of IT monitoring strategy. For Teams AV, testing under conditions (public or private sites) that reflect exactly what end-users are experiencing would establish the monitoring and troubleshooting process as more valuable. Monitoring of Teams WebRTC/AV stats from cloud sites is not realistic while everyone works from home today. On the other hand, Teams messaging, and availability monitoring of Teams can be run global public vantage sites.

For WFH employees, the combination of synthetic and real-user monitoring helps to provide a complete overview of the user experience along with a high-level insight into the health of the network and infrastructure. Because organizations need scale and reliability, both techniques can eliminate guesswork, resolve visibility, reduce MTTR and establish a solid monitoring foundation to a greater extent.

Sign up for a Free 15-day Trial.

Sidharth Kumar is Director of Product Marketing at Exoprise
Share this

The Latest

April 21, 2021

Few tools provide early detection of mission-critical mail outages. On March 15, Microsoft had a service outage worldwide that impacted its services such as Teams AV, Yammer, OneDrive, and Azure Active Directory. Users reported not being able to login into either of these services and were getting timeout messages ...

April 20, 2021

More than half (60%) of IT organizations are investing in improving employee experience to support remote workforce productivity and performance according to The Changing Role of the IT Leader study by Elastic ...

April 19, 2021

Why are CDNs becoming more important to so many businesses? And how will they handle the new applications coming out over the next few years? APMdigest sat down with Mehdi Daoudi, CEO and co-founder of Catchpoint Systems, to find out ...

April 15, 2021

A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report ...

April 14, 2021

As the Covid-19 pandemic forces a global reset of how we gather and work, 60% of organizations are looking forward to increased spending in 2021 to deploy new technologies, according to the 14th annual State of the Network global study of enterprise networking and security challenges released by VIAVI Solutions ...

April 13, 2021

Complexity breaks correlation. Intelligence brings cohesion. This simple principle is what makes real-time asset intelligence a must-have for AIOps that is meant to diffuse complexity. To further create a context for the user, it is critical to understand service dependencies and correlate alerts across the stack to resolve incidents ...

April 12, 2021

We're all familiar with the process of QA within the software development cycle. Developers build a product and send it to QA engineers, who test and bless it before pushing it into the world. After release, a different team of SREs with their own toolset then monitor for issues and bugs. Now, a new level of customer expectations for speed and reliability have pushed businesses further toward delivering rapid product iterations and innovations to keep up with customer demands. This leaves little time to run the traditional development process ...

April 08, 2021

On Wednesday January 27, 2021, Microsoft Office 365 experienced an outage affected a number of its services with a prolonged outage affecting Exchange Online. Despite Microsoft indicating that it was just Exchange Online affected during this outage, some monitoring tools detected that Azure Active Directory and dependent services like SharePoint and OneDrive were also affected at the time. The outage information indicated a rollout and rollback but we wouldn't expect to see such a widescale outage and slowdown just affecting some of the schema unless everything had to be taken offline ...

April 07, 2021

Application availability depends on the availability of other elements in a system, for example, network, server, operating system and so on, which support the application. Concentrating solely on the availability of any one block will not produce optimum availability of the application for the end user ...

April 06, 2021

A hybrid work environment will persist after the pandemic recedes, with over 80% stating that they expect over a quarter of workers to remain remote, and over two-thirds desiring flexibility between on-premises and remote deployments according to the 2021 State of the WAN report released by Aryaka ...