Skip to main content

Virtana AI Factory Observability Released

Virtana announced the launch of Virtana AI Factory Observability (AIFO), a new capability that extends Virtana’s full-stack observability platform to the unique demands of AI infrastructure. 

With deep, real-time insights into everything from GPU utilization and training bottlenecks to power consumption and cost drivers, AIFO enables enterprises to turn complex, compute-intensive AI environments into scalable, efficient, and accountable operations.

“AI has the potential to be as transformative as the steam engine or the printing press—but only if enterprises can operationalize it at scale,” said Paul Appleby, CEO of Virtana. “Right now, too many teams are flying blind when it comes to AI infrastructure. Virtana AIFO gives them the visibility and control they need to treat AI not as an experiment, but as a core, strategic part of the business.”

Virtana AIFO helps enterprises treat AI infrastructure with the same level of visibility, discipline, and accountability as traditional IT.

As an official NVIDIA partner, Virtana integrates natively with NVIDIA GPU platforms to deliver in-depth telemetry, including memory utilization, thermal behavior, and power metrics, providing precise, vendor-validated insight into the most performance-critical components of the AI Factory. This deep integration delivers accurate, actionable intelligence at enterprise scale.

“AI workloads introduce an entirely different set of infrastructure challenges—from GPU saturation and training bottlenecks to unpredictable cost spikes,” said Amitkumar Rathi, Senior Vice President of Engineering, Product, and Support at Virtana. “We designed AIFO to address these realities head-on. It gives teams deep, correlated visibility across the full AI stack, enabling them to optimize performance, reduce waste, and scale AI with confidence.”

With this launch, Virtana directly addresses the growing infrastructure challenges that stand in the way of scalable AI success. As enterprises accelerate investments in AI, many are encountering hidden inefficiencies: idle GPUs that inflate costs, training jobs that fail without explanation, and inference pipelines that stall due to underlying storage or network issues. AIFO is purpose-built to solve these problems, delivering real-time visibility and correlated insights across every layer of the AI infrastructure stack. The result is greater control over performance, spend, and scale—turning AI from a high-risk initiative into a high-impact capability.

Virtana AIFO is purpose-built to meet the demands of AI operations. It continuously collects telemetry across GPUs, CPUs, memory, network, and storage and then correlates that data with training and inference pipelines to provide clear and actionable insights.

Core capabilities include:

  • GPU Performance Monitoring – Tracks per-GPU metrics such as memory, utilization, thermal load, and power draw across multiple vendors.
  • Distributed Training Visibility – Identifies bottlenecks, synchronization issues, and stragglers across multi-node jobs.
  • Infrastructure-to-AI Mapping – Correlates model-level performance directly to hardware-level behavior, including network and storage dependencies.
  • Power and Cost Analytics – Exposes inefficiencies such as thermal throttling, idle GPU time, and overprovisioning resources.
  • Root Cause Analysis – Diagnoses training failures and inference slowdowns faster by pinpointing the most likely infrastructure causes.

All capabilities are accessible via Virtana’s Global View dashboard, which unifies telemetry across hybrid and containerized AI environments—on-premises, cloud, or both.

AIFO is already delivering measurable results in production AI environments across multiple industries. Operational outcomes include:

  • 40% reduction in idle GPU time, improving resource utilization and reducing infrastructure costs.
  • 60% faster mean time to resolution (MTTR) for AI-related incidents.
  • 50% decrease in false alerts, reducing operational noise and accelerating response.
  • 15% improvement in power efficiency, supporting sustainability goals.

Virtana AIFO is now generally available as a fully integrated capability within the Virtana Platform. Purpose-built for the demands of modern AI infrastructure, AIFO scales effortlessly from early-stage test environments to enterprise-grade AI factories. This launch, together with Virtana’s recent acquisition of Zenoss, further extends the company’s leadership in delivering the deepest, and broadest observability platform across applications, infrastructure, and AI workloads in hybrid and multi-cloud environments.

Additionally, Virtana’s recent acquisition of Zenoss expands the platform’s event intelligence and service-centric observability capabilities, allowing customers to correlate AI model performance with broader application behavior and infrastructure health. Together, these advancements deepen Virtana’s ability to help enterprises manage the full complexity of AI operations in the most demanding environments.

The Latest

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

A Gartner analyst recently suggested that GenAI tools could create 25% time savings for network operational teams. Where might these time savings come from? How are GenAI tools helping NetOps teams today, and what other tasks might they take on in the future as models continue improving? In general, these savings come from automating or streamlining manual NetOps tasks ...

IT and line-of-business teams are increasingly aligned in their efforts to close the data gap and drive greater collaboration to alleviate IT bottlenecks and offload growing demands on IT teams, according to The 2025 Automation Benchmark Report: Insights from IT Leaders on Enterprise Automation & the Future of AI-Driven Businesses from Jitterbit ...

A large majority (86%) of data management and AI decision makers cite protecting data privacy as a top concern, with 76% of respondents citing ROI on data privacy and AI initiatives across their organization, according to a new Harris Poll from Collibra ...

According to Gartner, Inc. the following six trends will shape the future of cloud over the next four years, ultimately resulting in new ways of working that are digital in nature and transformative in impact ...

2020 was the equivalent of a wedding with a top-shelf open bar. As businesses scrambled to adjust to remote work, digital transformation accelerated at breakneck speed. New software categories emerged overnight. Tech stacks ballooned with all sorts of SaaS apps solving ALL the problems — often with little oversight or long-term integration planning, and yes frequently a lot of duplicated functionality ... But now the music's faded. The lights are on. Everyone from the CIO to the CFO is checking the bill. Welcome to the Great SaaS Hangover ...

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...

Virtana AI Factory Observability Released

Virtana announced the launch of Virtana AI Factory Observability (AIFO), a new capability that extends Virtana’s full-stack observability platform to the unique demands of AI infrastructure. 

With deep, real-time insights into everything from GPU utilization and training bottlenecks to power consumption and cost drivers, AIFO enables enterprises to turn complex, compute-intensive AI environments into scalable, efficient, and accountable operations.

“AI has the potential to be as transformative as the steam engine or the printing press—but only if enterprises can operationalize it at scale,” said Paul Appleby, CEO of Virtana. “Right now, too many teams are flying blind when it comes to AI infrastructure. Virtana AIFO gives them the visibility and control they need to treat AI not as an experiment, but as a core, strategic part of the business.”

Virtana AIFO helps enterprises treat AI infrastructure with the same level of visibility, discipline, and accountability as traditional IT.

As an official NVIDIA partner, Virtana integrates natively with NVIDIA GPU platforms to deliver in-depth telemetry, including memory utilization, thermal behavior, and power metrics, providing precise, vendor-validated insight into the most performance-critical components of the AI Factory. This deep integration delivers accurate, actionable intelligence at enterprise scale.

“AI workloads introduce an entirely different set of infrastructure challenges—from GPU saturation and training bottlenecks to unpredictable cost spikes,” said Amitkumar Rathi, Senior Vice President of Engineering, Product, and Support at Virtana. “We designed AIFO to address these realities head-on. It gives teams deep, correlated visibility across the full AI stack, enabling them to optimize performance, reduce waste, and scale AI with confidence.”

With this launch, Virtana directly addresses the growing infrastructure challenges that stand in the way of scalable AI success. As enterprises accelerate investments in AI, many are encountering hidden inefficiencies: idle GPUs that inflate costs, training jobs that fail without explanation, and inference pipelines that stall due to underlying storage or network issues. AIFO is purpose-built to solve these problems, delivering real-time visibility and correlated insights across every layer of the AI infrastructure stack. The result is greater control over performance, spend, and scale—turning AI from a high-risk initiative into a high-impact capability.

Virtana AIFO is purpose-built to meet the demands of AI operations. It continuously collects telemetry across GPUs, CPUs, memory, network, and storage and then correlates that data with training and inference pipelines to provide clear and actionable insights.

Core capabilities include:

  • GPU Performance Monitoring – Tracks per-GPU metrics such as memory, utilization, thermal load, and power draw across multiple vendors.
  • Distributed Training Visibility – Identifies bottlenecks, synchronization issues, and stragglers across multi-node jobs.
  • Infrastructure-to-AI Mapping – Correlates model-level performance directly to hardware-level behavior, including network and storage dependencies.
  • Power and Cost Analytics – Exposes inefficiencies such as thermal throttling, idle GPU time, and overprovisioning resources.
  • Root Cause Analysis – Diagnoses training failures and inference slowdowns faster by pinpointing the most likely infrastructure causes.

All capabilities are accessible via Virtana’s Global View dashboard, which unifies telemetry across hybrid and containerized AI environments—on-premises, cloud, or both.

AIFO is already delivering measurable results in production AI environments across multiple industries. Operational outcomes include:

  • 40% reduction in idle GPU time, improving resource utilization and reducing infrastructure costs.
  • 60% faster mean time to resolution (MTTR) for AI-related incidents.
  • 50% decrease in false alerts, reducing operational noise and accelerating response.
  • 15% improvement in power efficiency, supporting sustainability goals.

Virtana AIFO is now generally available as a fully integrated capability within the Virtana Platform. Purpose-built for the demands of modern AI infrastructure, AIFO scales effortlessly from early-stage test environments to enterprise-grade AI factories. This launch, together with Virtana’s recent acquisition of Zenoss, further extends the company’s leadership in delivering the deepest, and broadest observability platform across applications, infrastructure, and AI workloads in hybrid and multi-cloud environments.

Additionally, Virtana’s recent acquisition of Zenoss expands the platform’s event intelligence and service-centric observability capabilities, allowing customers to correlate AI model performance with broader application behavior and infrastructure health. Together, these advancements deepen Virtana’s ability to help enterprises manage the full complexity of AI operations in the most demanding environments.

The Latest

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

A Gartner analyst recently suggested that GenAI tools could create 25% time savings for network operational teams. Where might these time savings come from? How are GenAI tools helping NetOps teams today, and what other tasks might they take on in the future as models continue improving? In general, these savings come from automating or streamlining manual NetOps tasks ...

IT and line-of-business teams are increasingly aligned in their efforts to close the data gap and drive greater collaboration to alleviate IT bottlenecks and offload growing demands on IT teams, according to The 2025 Automation Benchmark Report: Insights from IT Leaders on Enterprise Automation & the Future of AI-Driven Businesses from Jitterbit ...

A large majority (86%) of data management and AI decision makers cite protecting data privacy as a top concern, with 76% of respondents citing ROI on data privacy and AI initiatives across their organization, according to a new Harris Poll from Collibra ...

According to Gartner, Inc. the following six trends will shape the future of cloud over the next four years, ultimately resulting in new ways of working that are digital in nature and transformative in impact ...

2020 was the equivalent of a wedding with a top-shelf open bar. As businesses scrambled to adjust to remote work, digital transformation accelerated at breakneck speed. New software categories emerged overnight. Tech stacks ballooned with all sorts of SaaS apps solving ALL the problems — often with little oversight or long-term integration planning, and yes frequently a lot of duplicated functionality ... But now the music's faded. The lights are on. Everyone from the CIO to the CFO is checking the bill. Welcome to the Great SaaS Hangover ...

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...