Skip to main content

Arize Introduces Open Source LLM Evals Library and Support for Traces and Spans

Arize Phoenix rolled out several capabilities in its latest release.

Phoenix's new support for LLM traces and spans means that AI engineers and developers can get visibility at a span-level and see exactly where an app breaks, with tools to analyze each step rather than just the end-result.

This capability is particularly useful for early app developers because it doesn't require them to send data to a SaaS platform to perform LLM evaluation and troubleshooting -- instead, the open-source solution provides a mechanism for pre-deployment LLM observability directly from their local machine. Phoenix supports all common spans and has a native integration into LlamaIndex and LangChain.

The new Phoenix LLM evals library is also designed for fast and accurate LLM-assisted evaluations, ultimately making the use of the evaluation LLM easy to implement. Applying data science rigor to the testing of model and template combinations, Phoenix offers proven LLM evals for common use cases and needs around retrieval (RAG) relevance, reducing hallucinations, question-and-answer on retrieved data, toxicity, code generation, summarization, and classification. The Phoenix LLM evals library is optimized to run evaluations quickly with support for the notebook, Python pipeline, and app frameworks such as LangChain and LlamaIndex.

"Large language models are poised to transform industries and society, but when it comes to robust performance going from toy to production remains a challenge," said Jason Lopatecki, CEO and Co-Founder of Arize AI. "These industry-first updates from Phoenix promise to provide better LLM evals and deeper troubleshooting to make complex LLM-powered systems ready and reliable in the real world."

The Latest

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...

Despite the frustrations, every engineer we spoke with ultimately affirmed the value and power of OpenTelemetry. The "sucks" moments are often the flip side of its greatest strengths ... Part 2 of this blog covers the powerful advantages and breakthroughs — the "OTel Rocks" moments ...

OpenTelemetry (OTel) arrived with a grand promise: a unified, vendor-neutral standard for observability data (traces, metrics, logs) that would free engineers from vendor lock-in and provide deeper insights into complex systems ... No powerful technology comes without its challenges, and OpenTelemetry is no exception. The engineers we spoke with were frank about the friction points they've encountered ...

Enterprises are turning to AI-powered software platforms to make IT management more intelligent and ensure their systems and technology meet business needs for efficiency, lowers costs and innovation, according to new research from Information Services Group ...

The power of Kubernetes lies in its ability to orchestrate containerized applications with unparalleled efficiency. Yet, this power comes at a cost: the dynamic, distributed, and ephemeral nature of its architecture creates a monitoring challenge akin to tracking a constantly shifting, interconnected network of fleeting entities ... Due to the dynamic and complex nature of Kubernetes, monitoring poses a substantial challenge for DevOps and platform engineers. Here are the primary obstacles ...

The perception of IT has undergone a remarkable transformation in recent years. What was once viewed primarily as a cost center has transformed into a pivotal force driving business innovation and market leadership ... As someone who has witnessed and helped drive this evolution, it's become clear to me that the most successful organizations share a common thread: they've mastered the art of leveraging IT advancements to achieve measurable business outcomes ...

More than half (51%) of companies are already leveraging AI agents, according to the PagerDuty Agentic AI Survey. Agentic AI adoption is poised to accelerate faster than generative AI (GenAI) while reshaping automation and decision-making across industries ...

Image
Pagerduty

 

Real privacy protection thanks to technology and processes is often portrayed as too hard and too costly to implement. So the most common strategy is to do as little as possible just to conform to formal requirements of current and incoming regulations. This is a missed opportunity ...

The expanding use of AI is driving enterprise interest in data operations (DataOps) to orchestrate data integration and processing and improve data quality and validity, according to a new report from Information Services Group (ISG) ...

Arize Introduces Open Source LLM Evals Library and Support for Traces and Spans

Arize Phoenix rolled out several capabilities in its latest release.

Phoenix's new support for LLM traces and spans means that AI engineers and developers can get visibility at a span-level and see exactly where an app breaks, with tools to analyze each step rather than just the end-result.

This capability is particularly useful for early app developers because it doesn't require them to send data to a SaaS platform to perform LLM evaluation and troubleshooting -- instead, the open-source solution provides a mechanism for pre-deployment LLM observability directly from their local machine. Phoenix supports all common spans and has a native integration into LlamaIndex and LangChain.

The new Phoenix LLM evals library is also designed for fast and accurate LLM-assisted evaluations, ultimately making the use of the evaluation LLM easy to implement. Applying data science rigor to the testing of model and template combinations, Phoenix offers proven LLM evals for common use cases and needs around retrieval (RAG) relevance, reducing hallucinations, question-and-answer on retrieved data, toxicity, code generation, summarization, and classification. The Phoenix LLM evals library is optimized to run evaluations quickly with support for the notebook, Python pipeline, and app frameworks such as LangChain and LlamaIndex.

"Large language models are poised to transform industries and society, but when it comes to robust performance going from toy to production remains a challenge," said Jason Lopatecki, CEO and Co-Founder of Arize AI. "These industry-first updates from Phoenix promise to provide better LLM evals and deeper troubleshooting to make complex LLM-powered systems ready and reliable in the real world."

The Latest

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...

Despite the frustrations, every engineer we spoke with ultimately affirmed the value and power of OpenTelemetry. The "sucks" moments are often the flip side of its greatest strengths ... Part 2 of this blog covers the powerful advantages and breakthroughs — the "OTel Rocks" moments ...

OpenTelemetry (OTel) arrived with a grand promise: a unified, vendor-neutral standard for observability data (traces, metrics, logs) that would free engineers from vendor lock-in and provide deeper insights into complex systems ... No powerful technology comes without its challenges, and OpenTelemetry is no exception. The engineers we spoke with were frank about the friction points they've encountered ...

Enterprises are turning to AI-powered software platforms to make IT management more intelligent and ensure their systems and technology meet business needs for efficiency, lowers costs and innovation, according to new research from Information Services Group ...

The power of Kubernetes lies in its ability to orchestrate containerized applications with unparalleled efficiency. Yet, this power comes at a cost: the dynamic, distributed, and ephemeral nature of its architecture creates a monitoring challenge akin to tracking a constantly shifting, interconnected network of fleeting entities ... Due to the dynamic and complex nature of Kubernetes, monitoring poses a substantial challenge for DevOps and platform engineers. Here are the primary obstacles ...

The perception of IT has undergone a remarkable transformation in recent years. What was once viewed primarily as a cost center has transformed into a pivotal force driving business innovation and market leadership ... As someone who has witnessed and helped drive this evolution, it's become clear to me that the most successful organizations share a common thread: they've mastered the art of leveraging IT advancements to achieve measurable business outcomes ...

More than half (51%) of companies are already leveraging AI agents, according to the PagerDuty Agentic AI Survey. Agentic AI adoption is poised to accelerate faster than generative AI (GenAI) while reshaping automation and decision-making across industries ...

Image
Pagerduty

 

Real privacy protection thanks to technology and processes is often portrayed as too hard and too costly to implement. So the most common strategy is to do as little as possible just to conform to formal requirements of current and incoming regulations. This is a missed opportunity ...

The expanding use of AI is driving enterprise interest in data operations (DataOps) to orchestrate data integration and processing and improve data quality and validity, according to a new report from Information Services Group (ISG) ...