Testing AI with AI: Navigating the Challenges of QA
November 13, 2024

Robert Salesas
Leapwork

Share this

AI sure grew fast in popularity, but are AI apps any good?

Well, there are some snags. We ran some research recently that showed 85% of companies have integrated AI apps into their tech stack in the last year. Pretty impressive number, but we also learned that many of those companies are running head-first into some issues: 68% have already experienced some significant problems related to the performance, accuracy, and reliability of those AI apps.

If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment.

The Wild Wild West of AI Applications

That AI apps are buggy isn't necessarily a damnation of AI as a concept. It simply draws attention to the reality that AI apps are being managed within complex, interconnected systems. Many of these AI apps are integrated into sprawling tech stack ecosystems, and most AI tools in their current form don't exactly work perfectly out of the box. AI applications require continuous evaluation, validation, and fine-tuning to deliver on expectations.

Without that validation process, you risk stifling the effectiveness of AI apps with bugs and security vulnerabilities (security risks were one of the most commonly flagged issues for AI applications). Ultimately, that means the company doing the integration just becomes exposed to system failures, decreased customer satisfaction, and reputational damage. And considering how reliant the world will likely soon be on AI, that's something every business should aim to avoid.

Fixing AI … with AI?

Ironically, the answer many companies seem to have settled on for fixing their testing inefficiencies is AI-augmented testing. We found that 79% of companies have already adopted AI-augmented testing tools, and 64% of C-Suites trust their results (technical teams trust even more at 72%).

Is that not a bit paradoxical? Why fix AI with more AI?

In the right context, AI-augmented testing tools can be that second set of eyes (long live the four-eyes principle) to vet the shortcomings of AI systems with rigorous, unbiased reviews of performance. The reason you would use AI-augmented testing is to gauge how well generative AI deals with specific tasks or responds to user-defined prompts. They can compare AI-generated answers versus predefined, human-crafted expectations. That matters when AI models so often hallucinate nonsensical information.

You can imagine the many linguistic permutations for asking an AI chatbot, "Do you offer international shipping?" A response needs to be factually right regardless of how the question was asked, and that's where AI-augmented testing tools shine in automating the validation process for variables.

Do We Need Human QA Testers?

There's just one outstanding question: What happens to the human QA testers if everyone starts using AI-augmented testing?

The short answer to this question? They'll still be around, don't you worry, because over two-thirds (68%) of C-Suite executives we've spoken to have said they believe human validation will remain essential for ensuring quality across complex systems.  Actually, 53% of C-Suite executives told us they saw an increase in new positions requiring AI expertise. Fancy that ...

There's a good reason why humans won't disappear from QA teams. AI isn't perfect, and that extends to testing. Some testing tools can do things like self-healing scripts where the AI adjusts a test in line with minor app changes, but they can't handle the complexity of most real-world applications without any human supervision. We have AI agents, but they don't have agency. Autonomous testing agents can't just suddenly decide independently to test your delivery app to check whether your pizza orders are going through.

All of which is to say that some degree of human validation will be needed for the foreseeable future to ensure accuracy and relevance. Humans need to be there to decide what to automate, what not to automate, and how to create good testing procedures. The future of QA isn't about replacing humans but evolving their roles. Human testers will increasingly focus on overseeing and fine-tuning AI tools, interpreting complex data, and bringing critical thinking to the testing process.

AI offers huge amounts of promise, but this promise created by adoption must be paired with a vigilant approach to quality assurance. By combining the efficiency of AI tools with human creativity and critical thinking, businesses can ensure higher-quality outcomes and maintain trust in their increasingly complex systems.

Robert Salesas is CTO of Leapwork
Share this

The Latest

December 02, 2024

From the accelerating adoption of artificial intelligence (AI) and generative AI (GenAI) to the ongoing challenges of cost optimization and security, these IT leaders are navigating a complex and rapidly evolving landscape. Here's what you should know about the top priorities shaping the year ahead ...

November 26, 2024

In the heat of the holiday online shopping rush, retailers face persistent challenges such as increased web traffic or cyber threats that can lead to high-impact outages. With profit margins under high pressure, retailers are prioritizing strategic investments to help drive business value while improving the customer experience ...

November 25, 2024

In a fast-paced industry where customer service is a priority, the opportunity to use AI to personalize products and services, revolutionize delivery channels, and effectively manage peaks in demand such as Black Friday and Cyber Monday are vast. By leveraging AI to streamline demand forecasting, optimize inventory, personalize customer interactions, and adjust pricing, retailers can have a better handle on these stress points, and deliver a seamless digital experience ...

November 21, 2024

Broad proliferation of cloud infrastructure combined with continued support for remote workers is driving increased complexity and visibility challenges for network operations teams, according to new research conducted by Dimensional Research and sponsored by Broadcom ...

November 20, 2024

New research from ServiceNow and ThoughtLab reveals that less than 30% of banks feel their transformation efforts are meeting evolving customer digital needs. Additionally, 52% say they must revamp their strategy to counter competition from outside the sector. Adapting to these challenges isn't just about staying competitive — it's about staying in business ...

November 19, 2024

Leaders in the financial services sector are bullish on AI, with 95% of business and IT decision makers saying that AI is a top C-Suite priority, and 96% of respondents believing it provides their business a competitive advantage, according to Riverbed's Global AI and Digital Experience Survey ...

November 18, 2024

SLOs have long been a staple for DevOps teams to monitor the health of their applications and infrastructure ... Now, as digital trends have shifted, more and more teams are looking to adapt this model for the mobile environment. This, however, is not without its challenges ...

November 14, 2024

Modernizing IT infrastructure has become essential for organizations striving to remain competitive. This modernization extends beyond merely upgrading hardware or software; it involves strategically leveraging new technologies like AI and cloud computing to enhance operational efficiency, increase data accessibility, and improve the end-user experience ...

November 13, 2024

AI sure grew fast in popularity, but are AI apps any good? ... If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment ...

November 12, 2024

If you were lucky, you found out about the massive CrowdStrike/Microsoft outage last July by reading about it over coffee. Those less fortunate were awoken hours earlier by frantic calls from work ... Whether you were directly affected or not, there's an important lesson: all organizations should be conducting in-depth reviews of testing and change management ...