Arize Introduces Open Source LLM Evals Library and Support for Traces and Spans
October 02, 2023
Share this

Arize Phoenix rolled out several capabilities in its latest release.

Phoenix's new support for LLM traces and spans means that AI engineers and developers can get visibility at a span-level and see exactly where an app breaks, with tools to analyze each step rather than just the end-result.

This capability is particularly useful for early app developers because it doesn't require them to send data to a SaaS platform to perform LLM evaluation and troubleshooting -- instead, the open-source solution provides a mechanism for pre-deployment LLM observability directly from their local machine. Phoenix supports all common spans and has a native integration into LlamaIndex and LangChain.

The new Phoenix LLM evals library is also designed for fast and accurate LLM-assisted evaluations, ultimately making the use of the evaluation LLM easy to implement. Applying data science rigor to the testing of model and template combinations, Phoenix offers proven LLM evals for common use cases and needs around retrieval (RAG) relevance, reducing hallucinations, question-and-answer on retrieved data, toxicity, code generation, summarization, and classification. The Phoenix LLM evals library is optimized to run evaluations quickly with support for the notebook, Python pipeline, and app frameworks such as LangChain and LlamaIndex.

"Large language models are poised to transform industries and society, but when it comes to robust performance going from toy to production remains a challenge," said Jason Lopatecki, CEO and Co-Founder of Arize AI. "These industry-first updates from Phoenix promise to provide better LLM evals and deeper troubleshooting to make complex LLM-powered systems ready and reliable in the real world."

Share this

The Latest

April 15, 2024

Organizations recognize the value of observability, but only 10% of them are actually practicing full observability of their applications and infrastructure. This is among the key findings from the recently completed Logz.io 2024 Observability Pulse Survey and Report ...

April 11, 2024

Businesses must adopt a comprehensive Internet Performance Monitoring (IPM) strategy, says Enterprise Management Associates (EMA), a leading IT analyst research firm. This strategy is crucial to bridge the significant observability gap within today's complex IT infrastructures. The recommendation is particularly timely, given that 99% of enterprises are expanding their use of the Internet as a primary connectivity conduit while facing challenges due to the inefficiency of multiple, disjointed monitoring tools, according to Modern Enterprises Must Boost Observability with Internet Performance Monitoring, a new report from EMA and Catchpoint ...

April 10, 2024

Choosing the right approach is critical with cloud monitoring in hybrid environments. Otherwise, you may drive up costs with features you don’t need and risk diminishing the visibility of your on-premises IT ...

April 09, 2024

Consumers ranked the marketing strategies and missteps that most significantly impact brand trust, which 73% say is their biggest motivator to share first-party data, according to The Rules of the Marketing Game, a 2023 report from Pantheon ...

April 08, 2024

Digital experience monitoring is the practice of monitoring and analyzing the complete digital user journey of your applications, websites, APIs, and other digital services. It involves tracking the performance of your web application from the perspective of the end user, providing detailed insights on user experience, app performance, and customer satisfaction ...

April 04, 2024
Modern organizations race to launch their high-quality cloud applications as soon as possible. On the other hand, time to market also plays an essential role in determining the application's success. However, without effective testing, it's hard to be confident in the final product ...
April 03, 2024

Enterprises are experiencing a 13% year-over-year increase in customer-facing incidents, reflecting rising levels of complexity and risk as businesses drive operational transformation at scale, according to the 2024 State of Digital Operations study from PagerDuty ...

April 02, 2024

According to Grafana Labs' 2024 Observability Survey, it doesn't matter what industry a company is in or the number of employees they have, the truth is: the more mature their observability practices are, the more time and money they save. From AI to OpenTelemetry — here are four key takeaways from this year's report ...

April 01, 2024

In an age where technology evolves at a breakneck pace, it's crucial to explore how AI assistants can revolutionize our work processes and daily lives, ultimately enhancing overall performance ...

March 28, 2024

Nearly all (99%) globa IT decision makers, regardless of region or industry, recognize generative AI's (GenAI) transformative potential to influence change within their organizations, according to The Elastic Generative AI Report ...