Leveraging Risk-based Testing in the Age of AI

Abstract

Mobile QA teams are constantly looking for smart ways to Shift-Left responsibly, though long test cycles and flaky tests prevent them from doing so efficiently. Learn how AI can achieve your Shift-Left ambitions by auto-executing only the tests impacted by developers and stopping flaky tests from breaking builds for faster and cleaner feedback loops.

Leveraging Risk-based Testing in the Age of AI

Don’t miss the chance to gain expert insights on securing executive support, managing change, and driving automation maturity in your organization.

Learn More

Additional Resources

Find the potential of mobile app testing with our eBook collection. Strategies and solutions for faster testing and better outcomes await.eBook

AI For All: How Hyperautomation Creates Better Testers

Artificial Intelligence can be a four letter word for some. In this ebook, you’ll learn why AI isn’t something to be feared, and how it can make you a better tester.

A Practical Guide: Improving Quality In Software Development with AI

Whitepaper

State of Mobile Automation & AI 2024

Our recent survey showed that a majority of mobile app developers are using Generative AI– and it’s only growing.

Download Now

Speakers

Video Transcript

0:00 | Johnston Harris
Looks like kicking things off. Hi everyone. My name is Johnston Harris. I am the CEO and co founder of Appsurify. I’m going to be talking about how we’re leveraging AI to optimize automation testing. I’ll go ahead and walk you all through some slides. Feel free to drop conversations in the side in the chat and we’ll take it away from there. So, let me go ahead and share my screen. Let me get this working on the first try.

0:31 | Johnston Harris
There it is.

0:40 | Johnston Harris
Hang on Yep.

0:48 | Johnston Harris
All right. Hopefully, you all can see that. So we’ll be talking about how we’re leveraging AI to optimize test automation through risk based testing. Now, the problem that I’m sure many of you have encountered with automation testing is when a developer makes a small change to the application, you have to run 100 percent of your tests, right? There’s no way to determine where that small change was made unless you’re a really good tester, especially when it comes to UI automation, mobile testing. Because those tests are so far removed from the code change itself. They may be run as part of the pipeline. They also may be run asynchronous or outside the pipeline. Irregardless, you’re pushing your automation tests 100 percent of them. And that can take a considerable amount of time Depending on your application. That could be 30 minutes. It could be three hours. It could be 13 hours. It really just depends on how much… mature and far along you are in your automation test practice. But irregardless, even five minutes statistics show even five minutes of wait time for those automation testing results to get back. Developers have already likely moved on to something else. So getting them fast feedback early is very important in helping them shift left. Because when test results take a long time, I’m sure a lot of you have, you know, encountered this along your automation journey is that if you’re you know, actively testing but you’re not testers aren’t getting feedback quickly. You’re potentially releasing buggy software into the next stage of the pipeline which could eventually make it to the end user and cause, you know, an actual functional regression which is not good. Missed deadlines. If your functional mobile regression test week takes a long time to complete, say 10 hours and you find a progression on Friday, your team is either staying over the weekend or pushing that deadline back Happens all the time Lost dev output. So flakiness or just any minute of downtime for developer wait time is lost productivity Also flakiness, if regression tests, you know, mobile testing has a high degree of flakiness associated with it. Not developers start to not trust the results as they once did because that test was passing, failing false positives, and they don’t really know if this is a real failure or not. And they start to lose trust in the test results. So that degrades their ability to learn from them. And also any wait time associated with that causes just prolonged pushback productivity, Then high testing costs. If you’re continuously running these tests either locally or pushing them into the cloud such as BrowserStack. Or Saucelabs. That’s a lot of money, especially if you’re pulling up pushing up in high parallel threads, 50 parallel threads, 100 parallel threads. There’s only so much… issue before you start hitting the law of diminishing returns. So, what we’re talking about today is what are the options that we’ve seen in the market to try to get automation time under an acceptable amount of time? You may have an SLA, It may be 15 minutes. It may be one hour And any time it goes over one hour, you’ve got to throw more hardware at it That’s kind of where we are in the marketplace right now. Heavy parallelization, concurrent threads, just throwing more hardware at the solution. The problem with that, as I just mentioned, it encounters the law, of diminishing returns. It doesn’t always have the return that you might expect it. And CPU and memory just keeps going up and AWS bill and Azure bill. As of right now, it’s becoming more of a sensitive subject. So finding a smarter way to do that is really relevant in today’s. Conversation around AI, Using test coverage to select which test. This is great, but it requires instrumentation into the build itself. And if anyone has ever worked with instrumentation, they know, it’s very taxing on the CI system on your Jenkins or Buildkite or Azure DevOps pipeline. It also only really applies to unit tests. Unit test is very easy from a risk based perspective because you’re very close to the code itself. But when you get to mobile UI or end to end workflows, those tests are so far removed from the code change itself that test coverage really starts to deteriorate and degrade and not produce the type of results that you’d expect from it. Thirdly you might entertain the option of reducing the test suite size, breaking up your monolith test suite into say 10 functional test suites, say one for payment, section one for account section. The thing is about that is heavy maintenance. These test suites have a life of their own. They’re brittle They can change. So this really unfortunately takes a long time out of people’s days to maintain. And then lastly just living with the pain… Test suite goes from one hour to an hour and a half to two hours. It continues to grow as your automation matures and you just kind of live with that. And suddenly you go from one hour to 10 hours and you feel like there’s really no better way to do it. So what we’ve essentially leveraged AI for is when a change comes in, we’re able to see where that change was made and auto select and execute just a few tests, mobile UI or end to end workflow tests associated with that change area. So rather than running 100 percent of your tests, you’re maybe running five percent or 10 percent while still catching the functional regressions. Pretty powerful. What this looks like. If you were to roll this out is here’s the dashboard is essentially AI sees where developers are changing. There’s two integration points. Basically, you connect to the Git repo, You push only metadata, the change metadata, And then you connect to the test, your mobile UI test And then essentially the model. When you make a change to the account section, the model knows that these are the tests associated with the account section. Let’s just run these because these are the only ones that have actually been impacted. So you see this nice delta, this nice high watermark with a delta saying, okay, instead of running six hours of tests which is a full regression, I’m just going to run the top one hour worth of tests Because AI is auto selecting and executing just a few tests, assessing the change data on a per commit per PR per merge basis. It’s really configurable how you want to set it up for success. Once those two connections are made to the Git repo and to the test themselves. And you create what we like to call a dynamic smoke test or dynamic regression test, saying that intelligence, you can insert that value at any stage that you want. So those tests take, you know, five hours and they run outside the pipeline Maybe say, hey, I don’t run the top 15 minutes of these regression tests or these mobile… test suites on a per command or per PR per merge basis earlier in the pipeline. So what you’re now doing is that regression test suite that’s asynchronous to the developer workflow, you’re plugging it in to the CI CD so that it’s now part of that build And it’s not slowing anything down and actually accelerating developer feedback because rather than waiting five hours later for test results, they’re now getting relevant regression feedback under 15 minutes, which is extremely powerful and extremely value added tests Also for, I know with Kobiton mobile testing, you know, we are focused on automation tests, automation test journey, accelerating that, helping you get there faster. We also do have an element of the side of the house of helping with manual testers. So because we’re connected to the Git repo, we can see change data, We can essentially in a slice of time generate a heat map. So between the hours of 10 a m and 11 a. M… today, what areas of the application were impacted? And generate this heat map which shows you the files folders that have basically been impacted. So you give your manual testers a focus for the exploratory testing. So rather if you need to onboard a new manual tester, then this is a great way to that steep learning curve because manual testers, it takes a long time to learn the applications, the ins and outs of the application. This is a way to expedite their onboarding and say, hey, you don’t need to test the whole application. I just, you know, especially during a hot fix, you just need to test this area that’s actually been seeing the change data in the last, you know, hour and a half, which is very powerful Here’s. A case that I’m about to dive into. And I don’t know if there’s any questions coming up because I can’t see the screen. So if there is a moderator, please feel free to throw them out as I’m talking.

9:10 | Cara Suarez
Yep. So far so far questions not coming in yet in real time, but I will throw them out at you. And as a reminder to everyone, QA, you can go ahead and type in the QA tab. Thanks.

9:26 | Johnston Harris
Yeah. So this is an interactive session as Cara just mentioned. So we are happy to get, I’m happy to steer the ship wherever you want to. But here’s a case study with a healthcare company that we’re working with. This is covering their mobile payments application, patient portals, about 100 applications under test covering in this environment. And it’s about 55 developers making about one to two commits a day. They have about 340 mobile UI tests and they take about four and a half hours to run. So these are very dense tests. These tests sometimes take five, 10 minutes to run. And ultimately, they’re only able to run once maybe twice a day. So because each run, you know, across the board is taking up about 371 hours… Downtime wait time essentially for the team. They’re also spinning up big server farms. They’re running these things in about 30 parallel threads because it’s the only way to keep it under four and a half hours because if they ran it sequentially it’d take over 20 hours to run these very large tests. So when we came in, we dropped that 339 tests down to only 67, An 80 percent drop in the tests being executed while still catching 98 point six percent of functional regressions. So we took a four point five hour run and slashed it by 80 percent while still catching the vast majority of regressions. And so one of the questions I normally get at this juncture is 98 point six That’s great. But it’s not 100 percent, right? So… what we’ve essentially done is this technology is essentially for lack of better words. How Google and Meta test, Google and Meta wrote a white paper about this several years ago that they essentially built out an in house proprietary AI model that targets smart subsets of tests on a given developer change and just runs those tests associated with that change area. And then at an interval of their choosing, whether it be once a night or every other night or on the weekends or before any release, they still run a full run. So what we’re essentially saying is when butts are in seats, when developers are on the task at hand, let’s get them fast automation, test feedback, regression feedback that they’ll otherwise be waiting hours for instantly and catch the vast majority of functional regressions in that smart subset. And then as a catch all schedule a full run increment of your choosing. So when you look at that summary page, our dashboard that I showed you above, it starts to resemble a heartbeat monitor for lack of better words During the day. Is low fast intervals running those tests on a per PR per merge basis, rapidly firing. And then on the weekend, you run that full regression. So this is a very smart way to cut down on automation tests, save vast amount of resources and accelerate developer productivity by getting that functional regression earlier and more economically. So for them, we took a four and a half hour run and knocked down to 54 minutes. Again, 80 percent time improvement. And the great thing about AI is it continually gets and learns The model trains every night. It continually recalibrates and learns. So you can actually crank up the dial as you speak Healthcare for them. They always want their pipeline to be running 20 percent of tests. We have other customers that are running at only 10 percent of tests or five percent of tests. So once that model in your risk profile comes into view, you can turn up the dial. So this four point five hour run could be knocked down to only 30 minutes or even 20 minutes or even 10 minutes and only run the relevant tests given a change area depending on your risk profile and your appetite. So ultimately, at the end of the day for this healthcare company that used to run four and a half hours of tests on every run, once or twice a day, we shaved off three and a half hours of test load, Three and a half hours per build, Their Azure DevOps, they were running three and a half hours worth of tests in their build. We reduced that down to under an hour, 54 minutes. We’re saving them over 300 hours… per day per run for their developer team. So what this means now is the developers can actually get more reps in, They can actually really start to increase velocity, finding functional regressions earlier more economically while they’re still on a commit or on that branch before they merge it in a later stage of the pipeline. So they’re able to find that functional regressions and correct it and move faster and get through their day and tasks a lot faster than they otherwise would waiting for those test results in the afternoon or the next day and ultimately providing them a very powerful ROI of over 20,000 dollars per day. So at the end of the day for AI, there’s a lot of use cases that this can be configurable for. And I feel like some of these use cases would be great for the QA sections Like, hey, this is what we’re doing? What would I do here? Ai risk based testing is the way we’ve designed. It is extremely broadly applicable. So we are code language agnostic, meaning we support all code languages, Java, C, Sharp, Python, Ruby, Golang, you name it, We’re also test type agnostic. So it’d be QUnit integration API end to end UI, obviously mobile. And that really allows us to going to play well in the ecosystem, play well with your existing QA practice that you’ve developed. Potentially, you have, you know, a big Kobiton automation test suite that you want to optimize. This is a great way to find, just run the Kobiton tests that actually matter given a change area. So ultimately, instead of running 100 percent of your tests, you’re just running 10 percent. So you’re accelerating test results by Tenex, Instead of running 100 percent of your tests every single time you’re running maybe 10 or 20 percent. We’re reducing your infrastructure demand, your Kubernetes cluster, your usage on AWS, reducing it significantly. We have a hard cost ROI associated with our solution because you’re not running all your tests all the time anymore. You’re just running maybe 10 or 20 percent. So we’ve actually recently took a 10,000 dollar AWS bill and we knocked it down on a three K, A7 thousand dollar delta back into your pocket just by running the tests that matter And great side of this developers get feedback earlier. Qa teams get find regressions back faster more economically… In the earliest stages of the pipeline versus later stages of the pipeline when those changes have already been merged in. Potentially. So ultimately it really hits three really buckets really well For quality assurance teams. It finds regressions earlier and more economically For DevOps teams, it rapidly accelerates CIC pipelines For software development teams. It gets feedback to developers faster for increased output and accelerating go to market timing. So really all pulls together nicely for a blend of the team to really hit those metrics, those KPIs, those goals you may have set for in 20 24 20 25. So I’d love to open up for a QA, I’m happy to talk about the architecture of the slide. You name it.

16:40 | Cara Suarez
Okay. Great. Well, we definitely have some questions that have come in during your talk. Let me go ahead and start. Okay. So the first one that came in was, how does AppShareify’s AI test selection technology determine which tests are impacted by recent developer changes?

17:02 | Johnston Harris
Yeah. So it starts with our integration. So how we integrate is through there’s two integration points. We connect to the Git based repository. We also support for other like Perforce, SVM, Microsoft TFPC, Connect to the Git based repository through a Git script. It pushes us the Git change data, the metadata code, snippets, files, folders, Git blame logs, that sort of thing. So it allows us to see what’s changed since the previous build. And then we connect to the tests, Whether it’s running inside or outside the CI pipeline. And when we get connected, when we see the change data and to the test, we essentially go into learning mode in the background for about two or three days. And the model trains in the background just observing… activity and as a unique model trained for your infrastructure for your project or your test suite. So when we see the test runs and the change activity coming through, what the model is doing is essentially mapping or auto linking tests to functional code areas. So we see a change in the account section and it has impact on the associated account section tests. We essentially create a linking, a tighter bond between those tests. There’s about 50 to 60 variables that go into feeding the training, the AI model, such as natural NLP, natural language processing, embeddings, but also test passing, test failures, test logs, the way the tests interact with the code, There’s a lot of variables and factors that go into the model training itself to determine when the developer makes a small change, a single line code change into the account section. We know that these are the account section tests And also even more granularly. Just these account section tests actually need to be run. Maybe you have a 1,000 tests and 100 account section tests, but only 10 account section tests are actually affected. Well, no, just run these 10 impacted account section tests, assisted that single line change of code in the account section. So once that model is fully trained up, It really is very granular. And if you’re running Cypress or Kobotan you can run it at a test case level. You can also run it at a test file level. But it’s very powerful and super granular in how it determines what tests to run.

19:10 | Cara Suarez
Yeah. No. That’s really interesting. We have a three part question that came in from Irina. So she says, how did you manage to decrease test execution time? Is it just by removing automated tests? And the third part is what approach did you use to do so?

19:32 | Johnston Harris
So, yeah. So we cannot speed up the time it takes for a test to execute Like the test. If it’s a five minute test, it’s always going to be five minutes Irregardless of what we’re essentially doing instead of if your test suite, say, for instance, you have a 1,000 tests And they take one hour to run six minutes. When we see a change coming, what we’re going to do is with the model, when we see a change coming, we’re going to rank all 1,000 tests, one, two, three four by order of priority. Given the change we’re going to, you know, test one is going to be the highest priority to give the change area two three four, all the way down to a 1,000. And then say, for instance, the parameter you said is I want to run the top 10 percent of tests. Given this change, we essentially rank all 1,000 And then cut it off at 100, Return a file back to you of the dynamic test. And then using your test running, you execute the test in that file. And in that test in the test in the file, those are the high ranking tests… are basically prioritized or are related relevant to that change area. So, instead of running, so what you’ve now done is, instead of running a 1,000 tests, you’re now running 100 tests. So we’re dropping those 900 irrelevant tests. So we’re not speeding up the tests themselves. What we’re saying is run less tests Because the vast majority of tests are going to pass or not going to be relevant to the change area. What we’re allowing is just a very micro focus on the tests that actually matter given the test, given the change area. Yeah. And did I, was there a third part of that question? Or did I answer it?

21:07 | Cara Suarez
It was like sort of like what approach did you use to do? So, I almost feel like that relates back to like how the AI works, but maybe a little bit.

21:16 | Johnston Harris
Yeah, that’s that kind of goes back to basically us ranking the tests and returning the tests to execute. And I say, for instance, you have 20 parallel threads. So we’ll take, you know, say a 60 minute run, you know, knock it down by 90 percent. So that run should only take now six minutes given that you still have hardware and infrastructure still in place. So, it all kind of comes down to strategy and your happy place. So if you have 20 parallel threads, you know, you’re running a 1,000 tests through 20 parallel threads, it takes an hour. And now we’re instead of running… you know, a 1,000 tests that take an hour. Now, we’re running 10 percent. So if you can maintain everything else being equal or the same, we’d expect that to reduce down to six minutes. Now, you could lower your concurrent threads. Your parallel servers Say, I don’t need 20 parallel servers anymore. That’s really expensive. Maybe I only need 10. And so maybe that’ll go from six minutes to 12 minutes, Get all else being the same instead of running 100 percent of tests, you’re now running 10 percent and it just alleviates a lot of load, on your infrastructure.

22:18 | Cara Suarez
Yeah, no, definitely makes sense. I have another question here. It says, how does Appsurify ensure compatibility with different test frameworks? Such as like, you know, end to end unit, other integrations, things like that.

22:34 | Johnston Harris
Yeah. So, that’s a great question. The good thing, the good thing about us is we’re agnostic, and we play well in the sandbox, So universally, all these frameworks, out there generate the same type of output, a pass fail criteria in a JUnit NUnit XML JSON format. And so all we need, all we do is just ingest those test results that JUnit reports And that’s it. So when we see the change data, we just ingest that test file, that test report. And that’s all we need to basically link tests to functional code areas. Wow. So that’s kind of the beauty of our solution is that, that’s how we’re able to have, you know, 50 60 70 plus test framework integrations because they all generate the same type of pass fail criteria, test file. And that’s all we ingest to train the model is that outputted test file with pass fail criteria.

23:31 | Cara Suarez
Okay. That makes sense. Okay. Just as a reminder for those of you out there, please use the QA text question section to go ahead and type in some questions. But while we’re waiting for more people to type, I just wanted to see if you could cover, you know, a little bit more about the benefits of executing only those tests impacted by developer changes in a CI to CD pipeline.

24:01 | Johnston Harris
Yeah. So there is a biotech company that we’re working with right now that they have 2000 end to end tests and they’re running in the pipeline on a per commit basis. They have 50 parallel threads. Their pipeline takes 35 minutes. They can’t throw any more hardware at the problem That 35 minutes. They wish it was one minute. Yeah, They really wish it was as little as possible. They don’t want their developers waiting more than five minutes By us. Now when they inserted us on a per commit basis, instead of running 100 percent of their 2000 end to end tests, we’re running just 10 percent. So instead of executing 2000 tests, we’re just running 200 tests on a per commit basis, We’ve reduced our 35 minute build through BuildKite. So if anyone’s using BuildKite out there, there’s a CI to CD pipeline that’s very popular. It’s really growing. We love BuildKite. Buildkite allows you to do spot instances… to basically bring your own hardware to the market. That’s a beautiful thing. But when it comes to test execution, that has a caveat to it in order to keep that 35 minute builds or those builds running fast, you realize you got to throw a lot of hardware at this. And the spot instance really kind of balloon in size and suddenly your AWS bill kind of goes through the roof Although your build type bill is pretty stable. Your AWS bill or Azure bill actually takes the hit. Because now you’re realizing you have to spin up 50 70 parallel threads to keep that SLA under wraps. So they came to us saying, hey, our AWS bill is through the roof because we’re having to spin up 50 70 parallel threads to keep this under 35 minutes. And we can’t throw any more hardware at it because we’re kind of, you know, we’re kind of hitting the roadblocks here. We’ll have diminishing returns and just, we can’t throw any more money or hardware at us. We just want to run the tests that matter. So when we plugged in instead of running 2000 tests per change, we ran 200. We took a 35 minute pipeline bill and it knocked it down under five minutes. Wow. So their tests, they run about 100 builds a day before that. Now they’re running three to four to 500 builds a day. So what that means is developers now making changes or committing changes into their code three four, five times more frequently per day. So that means in a span of one week, you have now three to four or five X each developer, because you’re getting that end to end workflow into their hands In under five minutes rather than waiting for them to 35 minutes or 45 minutes and they may have hopped around or something like that. So, the most optimal path of work is one, Yeah, Optimal thread of work is one, If you hop around multi task, there’s a lot of philosophy and psychology around context switching and we can understand that, you know, something is something in motion, let it work out while I go work on something else while that other thing finishes. But then ultimately, when you’re zeroed in on the task at hand, getting that feedback while you’re still in the task at hand really fast is uber important because you hop around and you take yourself out of the game. So we brought that for them. And so what they’re seeing is they’re seeing their AWS build drop in half because they don’t need as much and they’re seeing the developers get faster feedback. So there’s multiple value pathways of channels here for that environment.

27:14 | Cara Suarez
Wow. That’s fantastic. Well, I think that’s it for the questions. This is sort of last call for questions coming in the QA. And if you think of a question and you didn’t have time to chat it in, you can actually head over to the AppSherify booth in the expo That’s another place. You can get a hold of Johnston and also learn more about AppSherify and their solutions. Thank you so much Johnston. We really appreciate you joining the Mobile Testing and Experience Summit today.

27:48 | Johnston Harris
Great. Okay, Pleasure. Pleasure having and,

Ready to accelerate delivery of
your mobile apps?

Request a Demo

Leveraging Risk-based Testing in the Age of AI

Abstract

Additional Resources

Speakers

Video Transcript

Ready to accelerate delivery ofyour mobile apps?

Ready to accelerate delivery of
your mobile apps?