Skip to main content

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.