Selecting tests from the generated list

  • Question
  • Updated 10 months ago
  • Answered
Hi,

There is something that is not clear to me. When I click on "Analyze Tests", it shows me a nice chart, where I can see, for example, that 30 of the generated tests covers 80% of the interactions. But, what are those 30 tests? The first 30 tests in the generated list, or any set of 30 tests from the generated list?

Thanks!

Fernando

  • 15 Posts
  • 0 Likes

Posted 2 years ago

  • 1

Sean Johnson, CTO

  • 240 Posts
  • 18 Likes
Fernando,

It is specifically 80% from the first 30 tests in the list. The graph is telling you coverage of the possible tuples (pairs, triples, what have you) if you execute the first n many Hexawise generated tests. If you skipped some tests, and still executed 30 of them, but not the first 30, the resulting graph would look different and your amount of coverage of the possible tuples would be different (less).

I hope this helps!

Cheers,
Sean

Fernando

  • 15 Posts
  • 0 Likes
Thanks for the quick reply, Sean! And your answer is very clear.

Regards!

Justin Hunter, Hexawise Founder, Founder and CEO

  • 211 Posts
  • 9 Likes
Fernando,

Great question.

For additional information on this topic...

The best explanation of what the analyze tests screen means (and why there is a "decreasing marginal return"** towards the end of the graph) is found in the "How To Progress Checklists" that you can find in the upper left of the screen.

Once you're in Hexawise, to get the checklist to be visible, click on the two downward arrow keys located here..."



Then you'll want to open up the "Advanced" list. So you might need to click here....



Then the detailed explanation will begin when you click on "Analyze Tests."



- Justin

**"Decreasing marginal return" is another way to saying that the coverage achieved by the first few tests in the plan will be quite high (and the graph line will point up sharply) then the slope will decrease in the middle of the plan (because each new test will tend to test fewer net new pairs of values for the first time) and then at the end of the plan the line will flatten out quite a lot (because by the end, relatively few pairs of values will be tested together for the first time).

  • 3 Posts
  • 0 Likes
Maybe I'm missing something - like y'all are being extra clever - but isn't the _reason_ for decreasing marginal return that you are increasing repetition (of coverage) with each test?

In other words, each incremental test covers a "fixed" amount of the space, without regard to what has already been covered - so each incremental test has a higher proportion of what it covers being previously covered, diminishing the rate of return of "not-yet-covered code, covered by this test."

That would make sense if "tests have no memory" (like dice). If, however, tests "have memory" (like a deck of cards), then you could sort the "deck" in advance, to minimize the rate at which marginal return diminishes.

But I don't see how you could sort them in advance of executing them - the tests are a black-box, right? You discover, during execution, which code paths are executed. (dice)

Or is the "coverage" actually the percentage coverage of the exhaustive set of permuted input combinations? (deck of cards)

Two interesting experiments:
1) Would the graph in Justin's example above look any different if you did the tests in descending order (6-1) than ascending order (1-6)?
2) What does the graph of "permutation % covered" vs. "code base % coverage" look like (either in kloc or function points or another metric)?

Really fun stuff here - I miss doing the math of QA, hasn't been part of my role for years now.

Sean Johnson, CTO

  • 240 Posts
  • 18 Likes
Scott,

A few points:

1) The "coverage" Hexawise talks about has nothing to do with code coverage (code paths) and is referring to coverage of permutations of identified parameter's value pairs (or triples, quadruples, etc.). I've always thought this was a bit confusing and wish we had a better term to use than "coverage".

2) You are absolutely right that the reason for the decreasing returns is that there is increased repetition in the later tests (presuming you can the earlier tests). The increased repetition is a by product of trying to get 100% coverage of the possible tuples.

3) We sort the "deck" in advance to front load the coverage of the tuple pairs, NOT the coverage of the code paths. You are correct that Hexawise can't do that, it has no knowledge of the code paths. Hexawise makes no claim at all about coverage of code paths. Coverage of code paths ultimately depends on how good a job the test designer did at extracting the relevant parameters and values of the system under test. You would expect there to be some loose correlation between coverage of identified tuple pairs and coverage of code paths in most typical systems.

To your experimental questions:

1) Yes, the graph would look quite different.

2) Would depend greatly on the nature of the system under test and the quality of the test design (the inputs into Hexawise).

Cheers,
Sean

  • 3 Posts
  • 0 Likes
Thanks for the great replies, Sean and Justin (and thanks for the props too!).

Splitting my replies:

Sean:
I don't worry about the term "coverage" too much, except that it gets overloaded to mean "problem space coverage" (the % of permutations, as Hexawise measures) and "code (lines or paths or classes/functions) coverage."

One of the things I like about your approach is that it is the _right_ outside-in approach. From a "product quality" point of view, you want coverage of contexts or scenarios (how the product is used) - that's the true goal that many QA folks lose sight of. Most tools allow you to measure coverage of the code, which only measures how the tool is implemented to work - an inside-out view that only measures how well you've tested what your developers intended (ignoring that they could have made design mistakes that fail to cover all the scenarios, or that include code that is never executed). Measuring code-coverage can help the developers, but measuring scenario-coverage helps the product.

As you point out "test design quality" is also a factor that determines how much of the code is "covered" by the scenarios identified.

Justin Hunter, Hexawise Founder, Founder and CEO

  • 211 Posts
  • 9 Likes
Scott,

To expand upon what Sean said: (1) we don't measure code coverage, and (2) the coverage we're talking about is focused instead on the coverage of targeted input combinations. While study after study has shown there is a correlation between them, Hexawise has no way of calculating precise white box metrics like code coverage, branch coverage, statement coverage, or block coverage.

For any set of test inputs, there is a finite number of pairs** of values that could be tested together. The coverage chart answers, after each tests, what percentage of the total number of pairs (or triples, etc.) that could be tested together have been tested together so far?

Based on this coverage goal, decreasing marginal returns become inevitable.

Hexawise algorithms achieve the following objectives that help testers find as many defects as possible in as few tests as possible. In each and every step of each and every test case, the algorithm chooses a test condition that will maximize the number of pairs that can be covered for the first time in the test case. (Or, the maximum number of triplets or quadruplets, etc. based on the thoroughness setting defined by the user...)

Beneficial side effects of this approach are: (a) the generated tests have the maximum degree of variation from test to test, (b) every single possible pair (or triple, etc.) is tested at least once, and (c) the minimum amount of repetition occurs from test to test. Even so... if you count how many pairs of tests are covered for the first time in the first test in a generated set, there will be a very large number. In contrast, if you count the maximum number of pairs that can be tested in the final test, it will almost always be a significantly lower number. Why? Because every combination tested in test case number 1 will - by definition - be tested for the first time. In contrast, by the time you get to the final test in the set, there will be far fewer combinations that are possible to test for the first time; many times, there is only one combination that has not been tested in any of the prior tests. These dynamics, considered together, result in the decreasing marginal returns.

Incidentally, for any readers who have made it this far and want to learn more, I would recommend Scott's articles on pairwise and combinatorial test design. They are some of the clearest introductory articles about pairwise and combinatorial testing that have ever been written. They also contain some interesting data points related to the correlation between 2-way / allpairs / pairwise / n-way coverage (in Hexawise) and the white box metrics of branch coverage, block coverage and code coverage (not measurable by Hexawise).

In http://tynerblain.com/blog/2006/03/18... for example, Scott includes these data points:

"We measured the coverage of combinatorial design test sets for 10 Unix commands: basename, cb, comm, crypt, sleep, sort, touch, tty, uniq, and wc. [...]
The pairwise tests gave over 90 percent block coverage."

"Our initial trial of this was on a subset Nortel’s internal e-mail system where we able cover 97% of branches with less than 100 valid and invalid testcases, as opposed to 27 trillion exhaustive testcases."

"[...] a set of 29 pair-wise... tests gave 90% block coverage for the UNIX sort command. We also compared pair-wise testing with random input testing and found that pair-wise testing gave better coverage."

---------

**This reply mentions pairs and 2-way coverage because allpairs (AKA pairwise) is a well known and easy to understand test design strategy. Even so, it is worth emphasizing that Hexawise lets users create pairwise sets of tests that will test not only every pair but it also allows test designers to generate far more thorough sets of tests (3-way to 6-way). This allows users to "turn up the coverage dial" and generate tests that cover every single possible triplet of test inputs together at least once for example (or every 4-way combination or 5-way combination or 6-way combination).

  • 3 Posts
  • 0 Likes
Thanks for the great replies, Sean and Justin (and thanks for the props too!).

Splitting my replies:

tl;dr; - You could enable customers to correlate code-coverage with scenario-coverage, and there are benefits to your customers of doing that. Longer version follows:

Justin:
It makes sense (as does your explanation) that Hexawise cannot directly measure code coverage, with no access to the code. However, other tools do exist for measuring code-coverage during execution.

It can be pretty expensive to add instrumentation to an existing code base, but I've worked on projects where we programmatically introduced instrumentation into a code base (parsed the code base and added logging wherever it was needed - both with the "classes & functions covered" and with a "every place the code paths can vary" logic).

By running a test suite against an instrumented version of the code, the code-coverage can be measured by the developers (independent of Hexawise).

If you provided a way to (a) parse those code-coverage results, or (b) allow developers to "tell" Hexawise what the coverage results were after each scenario (maybe an API that allows them to define the "things to be covered" before the run, and "thing X was covered"), you could build that data-set.

It would give you the ability to gain correlation data, providing insights into your algorithm development (maybe discovering more efficient ways to achieve n-tuple coverage).

You could also leverage this data to help your customers, particularly by highlighting which parts of the code (things X and Y and Z) are the ones most exercised across the scenarios and the parts of the code that are dead or likely dead (thing X was _never_ logged, even with 99% scenario coverage).

This feedback could help them improve the effectiveness (oops - need to add parameter Q) and the efficiency (we can remove parameter M, it doesn't actually change our coverage results, allowing us to reduce the cost of running tests by an additional X%) of their testing.

At the next level, you could provide some language or domain specific benefits to customers. As an example - loading less javascript in an eCommerce webpage (to get faster page loads -> higher conversion rates (there's data that shows this benefit)) by culling the dead code.

Once someone is in this world (having instrumented code), they could also use that instrumentation to find the most-common real-world scenarios and combine it with the grammar that Hexawise outputs to define scenarios, to figure out quantitatively which combinations are "more important." Importance matters when you think of quality (and coverage) as a risk-mitigation exercise.

Then you could enable that input to guide the scenario-sequencing that Hexawise does. Instead of "percentage of scenarios covered" - with, presumably, every combination of variables having equivalent weight (value) - you could provide "percentage of risk" covered.

That could really help a mature team with prioritization of investments (for example between additional testing vs. new capability development) in an agile environment.

Sorry - got a little long-winded.

Justin Hunter, Hexawise Founder, Founder and CEO

  • 211 Posts
  • 9 Likes
Scott,

Very cool ideas. I like the way you think. I love working with teams that have these kinds of "we could improve things even more if we incorporated this idea" views. It would be exciting to see the data coming back about what parts of the code were missed and then collaborate with testers on the project to think about what new Parameters and/or new Values would need to be added to the Hexawise test inputs to make the next set of Hexawise-generated tests even more thorough. Thanks for sharing.

Sean Johnson, CTO

  • 240 Posts
  • 18 Likes
Michelle,

I'm glad you found this thread interesting. It's one of my favorites of all time here on the forum.

I can say it orthogonal array (pairwise) testing works fine for agile, we use an agile methodology at Hexawise ourselves, and roughly 50% of the customer projects we get involved with that are using Hexawise are agile. But, before I say more and risk going off on a half-baked answer to your question that I don't really understand, I'll ask you to clarify it a little.

What in particular about this thread about the distinction between input pair coverage (Hexawise's "coverage" analysis) and code/branch/etc. coverage (the more common use of the term "coverage" in QA circles) made you wonder about agile? I'd love to hear more about where you think the particular challenges with agile methods occur.

Cheers,
Sean