How Many Bugs Do Regression Tests Find?

Software QA FYI - SQAFYI

By: Brian Marick

What percentage of bugs are found by rerunning tests? That is, what's the value of this equation:

number of bugs in a release found by re-executing tests

100 X ------------------------------------------------------------ ?

number of bugs found by running all tests (for 1st or Nth time)

Like most single measures, this one is suspect. To make good decisions, we need more information. Before repeating percentages I've heard and read, I'll give an example of how the context of a measure matters. I'll compare two projects, A and B. In each, 30% of the bugs were found by rerunning regression tests.

All of project B's tests are regression tests, designed to be repeatable. We don't know if they're automated or not - that's not relevant to this example.

Some of project A's tests are regression tests, but some are "one-shot tests". One-shot tests are intended to be run exactly once. Nothing is done that would enable anyone to run them again. (One-shot tests are usually manual, but they can be automated - think of a quick and dirty program that you delete after running once.)

Regression tests find bugs in two ways. They can find bugs the first time they're run, and they can find bugs when they're re-executed. One-shot tests have only one chance to find a bug.

Here are the contexts that led to the 30% results:

	Project A	Project B
Bugs found while creating regression tests	10 bugs from 100 tests	70 bugs from 100 tests
Bugs found by repeating regression tests	90 bugs from the same 100 tests	30 bugs from the same 100 tests
Bugs found with one shot tests	200 bugs from 2000 tests	no one-shot tests

You'd have to say project A has a pretty impressive set of regression tests. What about project B?

In their case, rerunning tests isn't finding as many bugs as running tests for the first time. What if they had chosen to run 300 one-shot tests instead of crafting 100 repeatable tests? (Not at all an unrealistic possibility - regression tests can be quite expensive.) They might have found 210 bugs instead of the 100 they did find. We can't be sure without knowing even more of the context. But it's certain that this 30% doesn't mean the same thing as project A's 30%.

The numbers I give below do not contain information about the relative proportions of regression vs. one-shot tests, so they're less useful than they might be. But I still think they're worth repeating, for two reasons. First, the generally low percentage of bugs found by regression testing would have been a surprise to me around 1983, when I wrote "A test that can't be repeated is worthless." I think many people who've had a nasty experience with a buggy product become obsessed by automated regression tests as the solution to their problems. If they realized that an awful lot of projects seem to find 30% or fewer of their bugs through regression testing, they'd devote more thought and effort to the other 70%.

Second, by presenting these numbers, I may prompt people to share their numbers - and the context that produced them. If we can build a series of better-described case studies, more people will be able to make better decisions.

So here are the numbers I've gotten so far. Although regression tests can be manual as well as automated, it happens that all this data is from automated tests. The results might be different for repeated manual tests.

(Thanks to Danny Faught for prompting me to write this introductory material.)

The numbers
I was quoted 6% by a tool vendor, who cited some Gartner-Group-like study. I no longer remember who did the original study, nor have I seen it. Note that this vendor had a vested interest in the number being low.

Kaner's "Improving the Maintainability of Automated Test Suites" (Quality Week '97) says, "The estimates that I’ve heard range from 6% to 30%."

His 6% may be the same as mine, since I told that vendor to also talk to him.

Fewster & Graham, Software Test Automation, p 23, says: "James Bach reported that in his (extensive) experience, automated tests found only 15% of the defects while manual testing found 85%. (James Bach, Test Automation Snake Oil)"
The web versions of James's paper don't contain that number, but he confirmed it to me and identified it as coming from Borland. In the same book, a case report from Chip Groder has these numbers (p. 526):

29% (2 found during execution vs. 7 total)
15% (2 vs. 13)
0% (0 vs. 12)
10% (3 vs. 30)

"We believe that the number of bugs detected during execution will be much lower than the number found during the planning process." By planning process, he means that, during test design, "it is natural to use the application to verify the test steps required." Note that this manual testing is done early: "this is usually the first time anyone has systematically examined the user interface...".

In an email, James Tierney of Microsoft said:
"In a poll of Test Managers at Microsoft in 1995, I found the percentage of bugs found by test automation to vary between 5% and 45% in different groups. In groups working for me, the percentage has varied from 15-40%, averaging around 30%. Bugs found by automation tend to be higher in os/server/development tool groups and lower in GUI apps/games.

"Bugs caught by automated checkin tests or automated build verification tests are usually not counted, since they are rarely entered in the bug database, but fixed directly by the developer involved. So the actual number of bugs is understated to some degree."

An important addendum:
"A clarification: one group had 92% of their tests automated, and felt their biggest lack was not getting to 100%. But their product was buggy as sin, and they were replaced by a test group more interested in good testing than in finding every bug by automation."

I think that James's note clarifies a point I repeat ad nauseum: that the important skill is choosing the right tests to automate. I hope that my Testing Craft projects increase my skill at that.

(James's email is quoted with permission)

The consultant Ross Collard responded to the above data with some he collected in a survey:

"These people ... reported that, of all defects found, 12% were found by automated test cases vs. 88% found by manual ones.

"The difference was even more skewed for test cases which [found] severe defects -- about 95% of these were found by manual testing.

"In the beginning (first release), this difference is caused by the fact that most automated test cases have not yet been written. Several releases later, after a steady state has been achieved, this difference presumably is caused more by the repetitive testing of features which are believed to have not been changed. (They are re-tested as an insurance policy -- classic regression testing.)

"This data of mine includes only bugs found in black-box feature testing, not by white-box tools like memory leak detectors."

(Ross's email is quoted with permission.)

Metrics for evaluating application system testing
Metric = Formula

Test Coverage = Number of units (KLOC/FP) tested / total size of the system
Number of tests per unit size = Number of test cases per KLOC/FP
Acceptance criteria tested = Acceptance criteria tested / total acceptance criteria
Defects per size = Defects detected / system size
Test cost (in %) = Cost of testing / total cost *100
Cost to locate defect = Cost of testing / the number of defects located
Achieving Budget = Actual cost of testing / Budgeted cost of testing
Defects detected in testing = Defects detected in testing / total system defects
Defects detected in production = Defects detected in production/system size
Quality of Testing = No of defects found during Testing/(No of defects found during testing + No of acceptance defects found after delivery) *100

Effectiveness of testing to business = Loss due to problems / total resources processed by the system.
System complaints = Number of third party complaints / number of transactions processed
Scale of Ten = Assessment of testing by giving rating in scale of 1 to 10
Source Code Analysis = Number of source code statements changed / total number of tests.
Effort Productivity = Test Planning Productivity = No of Test cases designed / Actual Effort for Design and Documentation
Test Execution Productivity = No of Test cycles executed / Actual Effort for testing

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

How Many Bugs Do Regression Tests Find?