Arize AI debuted new capabilities for fine tuning and monitoring large language models (LLMs). The offering brings greater control and insight to teams looking to build with LLMs.
Now available as part of the free product, Arize's LLM observability tool evaluates LLM responses, pinpoints where to improve with prompt engineering, and identifies fine-tuning opportunities using vector similarity search.
The new offering is built to work in tandem with Phoenix, an open source library for LLM evaluation.
Leveraging Arize, teams can:
- Detect Problematic Prompts and Responses: By monitoring a model's prompt/response embeddings performance using LLM evaluation scores and cluster analysis, teams can narrow in on areas their LLM needs improvement.
- Analyze Clusters Using LLM Evaluation Metrics and GPT-4: Automatically generate clusters of semantically similar data points and sort by performance. Arize supports LLM-assisted evaluation metrics, task-specific metrics, along with user feedback. An integration with ChatGPT also enables teams to analyze clusters for deeper insights.
- Improve LLM Responses with Prompt Engineering: Pinpoint prompt/response clusters with low evaluation scores. Workflows suggest ways to augment prompts to help your LLM models generate better responses and improve acceptance rates.
- Fine-Tune Your LLM Using Vector Similarity Search: Find problematic clusters, such as inaccurate or unhelpful responses, to fine-tune with better data. Vector-similarity search clues you into other examples of issues emerging, so you can begin data augmentation before they become systemic.
- Leverage Pre-Built Clusters for Prescriptive Analysis: Use pre-built global clusters identified by Arize algorithms, or define custom clusters of your own to simplify RCA and make prescriptive improvements to your generative models.
"Despite the power of these models, the risk of deploying LLMs in high risk environments can be immense," notes Jason Lopatecki, CEO and Co-Founder of Arize. "As new applications get built, Arize LLM observability is here to provide the right guardrails to innovate with this new technology safely."
The Latest
Nearly all (99%) globa IT decision makers, regardless of region or industry, recognize generative AI's (GenAI) transformative potential to influence change within their organizations, according to The Elastic Generative AI Report ...
Agent-based approaches to real user monitoring (RUM) simply do not work. If you are pitched to install an "agent" in your mobile or web environments, you should run for the hills ...
The world is now all about end-users. This paradigm of focusing on the end-user was simply not true a few years ago, as backend metrics generally revolved around uptime, SLAs, latency, and the like. DevOps teams always pitched and presented the metrics they thought were the most correlated to the end-user experience. But let's be blunt: Unless there was an egregious fire, the correlated metrics were super loose or entirely false ...
This year, New Relic published the State of Observability for Financial Services and Insurance Report to share insights derived from the 2023 Observability Forecast on the adoption and business value of observability across the financial services industry (FSI) and insurance sectors. Here are seven key takeaways from the report ...
In MEAN TIME TO INSIGHT Episode 4 - Part 2, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and AIOps ...
In the course of EMA research over the last twelve years, the message for IT organizations looking to pursue a forward path in AIOps adoption is overall a strongly positive one. The benefits achieved are growing in diversity and value ...
Today, as enterprises transcend into a new era of work, surpassing the revolution, they must shift their focus and strategies to thrive in this environment. Here are five key areas that organizations should prioritize to strengthen their foundation and steer themselves through the ever-changing digital world ...
If there's one thing we should tame in today's data-driven marketing landscape, this would be data debt, a silent menace threatening to undermine all the trust you've put in the data-driven decisions that guide your strategies. This blog aims to explore the true costs of data debt in marketing operations, offering four actionable strategies to mitigate them through enhanced marketing observability ...
Gartner has highlighted the top trends that will impact technology providers in 2024: Generative AI (GenAI) is dominating the technical and product agenda of nearly every tech provider ...
In MEAN TIME TO INSIGHT Episode 4 - Part 1, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and network management ...