Arize AI debuted new capabilities for fine tuning and monitoring large language models (LLMs). The offering brings greater control and insight to teams looking to build with LLMs.
Now available as part of the free product, Arize's LLM observability tool evaluates LLM responses, pinpoints where to improve with prompt engineering, and identifies fine-tuning opportunities using vector similarity search.
The new offering is built to work in tandem with Phoenix, an open source library for LLM evaluation.
Leveraging Arize, teams can:
- Detect Problematic Prompts and Responses: By monitoring a model's prompt/response embeddings performance using LLM evaluation scores and cluster analysis, teams can narrow in on areas their LLM needs improvement.
- Analyze Clusters Using LLM Evaluation Metrics and GPT-4: Automatically generate clusters of semantically similar data points and sort by performance. Arize supports LLM-assisted evaluation metrics, task-specific metrics, along with user feedback. An integration with ChatGPT also enables teams to analyze clusters for deeper insights.
- Improve LLM Responses with Prompt Engineering: Pinpoint prompt/response clusters with low evaluation scores. Workflows suggest ways to augment prompts to help your LLM models generate better responses and improve acceptance rates.
- Fine-Tune Your LLM Using Vector Similarity Search: Find problematic clusters, such as inaccurate or unhelpful responses, to fine-tune with better data. Vector-similarity search clues you into other examples of issues emerging, so you can begin data augmentation before they become systemic.
- Leverage Pre-Built Clusters for Prescriptive Analysis: Use pre-built global clusters identified by Arize algorithms, or define custom clusters of your own to simplify RCA and make prescriptive improvements to your generative models.
"Despite the power of these models, the risk of deploying LLMs in high risk environments can be immense," notes Jason Lopatecki, CEO and Co-Founder of Arize. "As new applications get built, Arize LLM observability is here to provide the right guardrails to innovate with this new technology safely."
The Latest
Companies implementing observability benefit from increased operational efficiency, faster innovation, and better business outcomes overall, according to 2023 IT Trends Report: Lessons From Observability Leaders, a report from SolarWinds ...
Customer loyalty is changing as retailers get increasingly competitive. More than 75% of consumers say they would end business with a company after a single bad customer experience. This means that just one price discrepancy, inventory mishap or checkout issue in a physical or digital store, could have customers running out to the next store that can provide them with better service. Retailers must be able to predict business outages in advance, and act proactively before an incident occurs, impacting customer experience ...
Earlier this year, New Relic conducted a study on observability ... The 2023 Observability Forecast reveals observability's impact on the lives of technical professionals and businesses' bottom lines. Here are 10 key takeaways from the forecast ...
Only 33% of executives are "very confident" in their ability to operate in a public cloud environment, according to the 2023 State of CloudOps report from NetApp. This represents an increase from 2022 when only 21% reported feeling very confident ...
The majority of organizations across Australia and New Zealand (A/NZ) breached over the last year had personally identifiable information (PII) compromised, but most have not yet modified their data management policies, according to the Cybersecurity and PII Report from ManageEngine ...
A large majority of organizations employ more than one cloud automation solution, and this practice creates significant challenges that are resulting in delays and added costs for businesses, according to Why companies lose efficiency and compliance with cloud automation solutions from Broadcom ...
Companies have historically relied on tools that warn IT teams when their digital systems are experiencing glitches or attacks. But in an age where consumer loyalty is fickle and hybrid workers' Digital Employee Experience (DEX) is paramount for productivity, companies cannot afford to retroactively deal with IT failures that slow down employee productivity ...