
Virtana announced the launch of Virtana AI Factory Observability (AIFO), a new capability that extends Virtana’s full-stack observability platform to the unique demands of AI infrastructure.
With deep, real-time insights into everything from GPU utilization and training bottlenecks to power consumption and cost drivers, AIFO enables enterprises to turn complex, compute-intensive AI environments into scalable, efficient, and accountable operations.
“AI has the potential to be as transformative as the steam engine or the printing press—but only if enterprises can operationalize it at scale,” said Paul Appleby, CEO of Virtana. “Right now, too many teams are flying blind when it comes to AI infrastructure. Virtana AIFO gives them the visibility and control they need to treat AI not as an experiment, but as a core, strategic part of the business.”
Virtana AIFO helps enterprises treat AI infrastructure with the same level of visibility, discipline, and accountability as traditional IT.
As an official NVIDIA partner, Virtana integrates natively with NVIDIA GPU platforms to deliver in-depth telemetry, including memory utilization, thermal behavior, and power metrics, providing precise, vendor-validated insight into the most performance-critical components of the AI Factory. This deep integration delivers accurate, actionable intelligence at enterprise scale.
“AI workloads introduce an entirely different set of infrastructure challenges—from GPU saturation and training bottlenecks to unpredictable cost spikes,” said Amitkumar Rathi, Senior Vice President of Engineering, Product, and Support at Virtana. “We designed AIFO to address these realities head-on. It gives teams deep, correlated visibility across the full AI stack, enabling them to optimize performance, reduce waste, and scale AI with confidence.”
With this launch, Virtana directly addresses the growing infrastructure challenges that stand in the way of scalable AI success. As enterprises accelerate investments in AI, many are encountering hidden inefficiencies: idle GPUs that inflate costs, training jobs that fail without explanation, and inference pipelines that stall due to underlying storage or network issues. AIFO is purpose-built to solve these problems, delivering real-time visibility and correlated insights across every layer of the AI infrastructure stack. The result is greater control over performance, spend, and scale—turning AI from a high-risk initiative into a high-impact capability.
Virtana AIFO is purpose-built to meet the demands of AI operations. It continuously collects telemetry across GPUs, CPUs, memory, network, and storage and then correlates that data with training and inference pipelines to provide clear and actionable insights.
Core capabilities include:
- GPU Performance Monitoring – Tracks per-GPU metrics such as memory, utilization, thermal load, and power draw across multiple vendors.
- Distributed Training Visibility – Identifies bottlenecks, synchronization issues, and stragglers across multi-node jobs.
- Infrastructure-to-AI Mapping – Correlates model-level performance directly to hardware-level behavior, including network and storage dependencies.
- Power and Cost Analytics – Exposes inefficiencies such as thermal throttling, idle GPU time, and overprovisioning resources.
- Root Cause Analysis – Diagnoses training failures and inference slowdowns faster by pinpointing the most likely infrastructure causes.
All capabilities are accessible via Virtana’s Global View dashboard, which unifies telemetry across hybrid and containerized AI environments—on-premises, cloud, or both.
AIFO is already delivering measurable results in production AI environments across multiple industries. Operational outcomes include:
- 40% reduction in idle GPU time, improving resource utilization and reducing infrastructure costs.
- 60% faster mean time to resolution (MTTR) for AI-related incidents.
- 50% decrease in false alerts, reducing operational noise and accelerating response.
- 15% improvement in power efficiency, supporting sustainability goals.
Virtana AIFO is now generally available as a fully integrated capability within the Virtana Platform. Purpose-built for the demands of modern AI infrastructure, AIFO scales effortlessly from early-stage test environments to enterprise-grade AI factories. This launch, together with Virtana’s recent acquisition of Zenoss, further extends the company’s leadership in delivering the deepest, and broadest observability platform across applications, infrastructure, and AI workloads in hybrid and multi-cloud environments.
Additionally, Virtana’s recent acquisition of Zenoss expands the platform’s event intelligence and service-centric observability capabilities, allowing customers to correlate AI model performance with broader application behavior and infrastructure health. Together, these advancements deepen Virtana’s ability to help enterprises manage the full complexity of AI operations in the most demanding environments.
The Latest
Over the last year, we've seen enterprises stop treating AI as “special projects.” It is no longer confined to pilots or side experiments. AI is now embedded in production, shaping decisions, powering new business models, and changing how employees and customers experience work every day. So, the debate of "should we adopt AI" is settled. The real question is how quickly and how deeply it can be applied ...
In MEAN TIME TO INSIGHT Episode 20, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA presents his 2026 NetOps predictions ...
Today, technology buyers don't suffer from a lack of information but an abundance of it. They need a trusted partner to help them navigate this information environment ...
My latest title for O'Reilly, The Rise of Logical Data Management, was an eye-opener for me. I'd never heard of "logical data management," even though it's been around for several years, but it makes some extraordinary promises, like the ability to manage data without having to first move it into a consolidated repository, which changes everything. Now, with the demands of AI and other modern use cases, logical data management is on the rise, so it's "new" to many. Here, I'd like to introduce you to it and explain how it works ...
APMdigest's Predictions Series continues with 2026 Data Center Predictions — industry experts offer predictions on how data centers will evolve and impact business in 2026 ...
APMdigest's Predictions Series continues with 2026 DataOps Predictions — industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2026. Part 2 covers data and data platforms ...
APMdigest's Predictions Series continues with 2026 DataOps Predictions — industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2026 ...
Industry experts offer predictions on how Cloud will evolve and impact business in 2026. Part 3 covers Multi, Hybrid and Private Cloud ...
Industry experts offer predictions on how Cloud will evolve and impact business in 2026. Part 2 covers FinOps, Sovereign Cloud and more ...
APMdigest's Predictions Series continues with 2026 Cloud Predictions — industry experts offer predictions on how Cloud will evolve and impact business in 2026. Part 1 covers AI's impact on cloud and cloud's impact on AI ...