Skip to main content

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

Gartner highlighted the six trends that will have a significant impact on infrastructure and operations (I&O) for 2025 ...

Since IT costs can consume a significant share of revenue ... enterprises should (but often don't) pay close attention to the efficiency of IT operations at scale. Improving operational cost structures even fractionally can yield major savings for larger organizations, often in the tens of millions of dollars ...

Being able to access the full potential of artificial intelligence (AI) and advanced analytics has become a critical differentiator for businesses. These technologies allow for more informed decision-making, boost operational efficiency, enhance security, and reveal valuable insights hidden within massive data sets. Yet, for organizations to truly harness AI's capabilities, they must first tap into an often-overlooked asset: their mainframe data ...

The global IT skills shortage will persist, and perhaps worsen, over the next few years, carrying a collective price tag of more than $5 trillion. Organizations must search for ways to streamline their IT service management (ITSM) workflows in addition to, or even apart from, hiring more staff. Those who don't find alternative methods of ITSM efficiency will be left behind by their competitors ...

Embedding greater levels of deep learning into enterprise systems demands these deep-learning solutions to be "explainable," conveying to business users why it predicted what it predicted. This "explainability" needs to be communicated in an easy-to-understand and transparent manner to gain the comfort and confidence of users, building trust in the teams using these solutions and driving the adoption of a more responsible approach to development ...

Modern people can't spend a day without smartphones, and businesses have understood this very well! Mobile apps have become an effective channel for reaching customers. However, their distributed nature and delivery networks may cause performance problems ... Performance engineering can be a solution.

Image
Cigniti

Industry experts offer predictions on how Cloud, FinOps and related technologies will evolve and impact business in 2025. Part 3 covers FinOps ...

Industry experts offer predictions on how Cloud, FinOps and related technologies will evolve and impact business in 2025. Part 2 covers repatriation and more ...

Industry experts offer predictions on how Cloud, FinOps and related technologies will evolve and impact business in 2025 ...

Industry experts offer predictions on how NetOps, Network Performance Management, Network Observability and related technologies will evolve and impact business in 2025 ...

Testing AI with AI: Navigating the Challenges of QA

Robert Salesas
Leapwork

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork

Hot Topics

The Latest

Gartner highlighted the six trends that will have a significant impact on infrastructure and operations (I&O) for 2025 ...

Since IT costs can consume a significant share of revenue ... enterprises should (but often don't) pay close attention to the efficiency of IT operations at scale. Improving operational cost structures even fractionally can yield major savings for larger organizations, often in the tens of millions of dollars ...

Being able to access the full potential of artificial intelligence (AI) and advanced analytics has become a critical differentiator for businesses. These technologies allow for more informed decision-making, boost operational efficiency, enhance security, and reveal valuable insights hidden within massive data sets. Yet, for organizations to truly harness AI's capabilities, they must first tap into an often-overlooked asset: their mainframe data ...

The global IT skills shortage will persist, and perhaps worsen, over the next few years, carrying a collective price tag of more than $5 trillion. Organizations must search for ways to streamline their IT service management (ITSM) workflows in addition to, or even apart from, hiring more staff. Those who don't find alternative methods of ITSM efficiency will be left behind by their competitors ...

Embedding greater levels of deep learning into enterprise systems demands these deep-learning solutions to be "explainable," conveying to business users why it predicted what it predicted. This "explainability" needs to be communicated in an easy-to-understand and transparent manner to gain the comfort and confidence of users, building trust in the teams using these solutions and driving the adoption of a more responsible approach to development ...

Modern people can't spend a day without smartphones, and businesses have understood this very well! Mobile apps have become an effective channel for reaching customers. However, their distributed nature and delivery networks may cause performance problems ... Performance engineering can be a solution.

Image
Cigniti

Industry experts offer predictions on how Cloud, FinOps and related technologies will evolve and impact business in 2025. Part 3 covers FinOps ...

Industry experts offer predictions on how Cloud, FinOps and related technologies will evolve and impact business in 2025. Part 2 covers repatriation and more ...

Industry experts offer predictions on how Cloud, FinOps and related technologies will evolve and impact business in 2025 ...

Industry experts offer predictions on how NetOps, Network Performance Management, Network Observability and related technologies will evolve and impact business in 2025 ...