To expand upon what Sean said: (1) we don't measure code coverage, and (2) the coverage we're talking about is focused instead on the coverage of targeted input combinations. While study after study has shown there is a correlation between them, Hexawise has no way of calculating precise white box metrics like code coverage, branch coverage, statement coverage, or block coverage.
For any set of test inputs, there is a finite number of pairs** of values that could be tested together. The coverage chart answers, after each tests, what percentage of the total number of pairs (or triples, etc.) that could be tested together have been tested together so far?
Based on this coverage goal, decreasing marginal returns become inevitable.
Hexawise algorithms achieve the following objectives that help testers find as many defects as possible in as few tests as possible. In each and every step of each and every test case, the algorithm chooses a test condition that will maximize the number of pairs that can be covered for the first time in the test case. (Or, the maximum number of triplets or quadruplets, etc. based on the thoroughness setting defined by the user...)
Beneficial side effects of this approach are: (a) the generated tests have the maximum degree of variation from test to test, (b) every single possible pair (or triple, etc.) is tested at least once, and (c) the minimum amount of repetition occurs from test to test. Even so... if you count how many pairs of tests are covered for the first time in the first test in a generated set, there will be a very large number. In contrast, if you count the maximum number of pairs that can be tested in the final test, it will almost always be a significantly lower number. Why? Because every combination tested in test case number 1 will - by definition - be tested for the first time. In contrast, by the time you get to the final test in the set, there will be far fewer combinations that are possible to test for the first time; many times, there is only one combination that has not been tested in any of the prior tests. These dynamics, considered together, result in the decreasing marginal returns.
Incidentally, for any readers who have made it this far and want to learn more, I would recommend Scott's articles on pairwise and combinatorial test design. They are some of the clearest introductory articles about pairwise and combinatorial testing that have ever been written. They also contain some interesting data points related to the correlation between 2-way / allpairs / pairwise / n-way coverage (in Hexawise) and the white box metrics of branch coverage, block coverage and code coverage (not measurable by Hexawise).
for example, Scott includes these data points:
"We measured the coverage of combinatorial design test sets for 10 Unix commands: basename, cb, comm, crypt, sleep, sort, touch, tty, uniq, and wc. [...]
The pairwise tests gave over 90 percent block coverage."
"Our initial trial of this was on a subset Nortel’s internal e-mail system where we able cover 97% of branches with less than 100 valid and invalid testcases, as opposed to 27 trillion exhaustive testcases."
"[...] a set of 29 pair-wise... tests gave 90% block coverage for the UNIX sort command. We also compared pair-wise testing with random input testing and found that pair-wise testing gave better coverage."
**This reply mentions pairs and 2-way coverage because allpairs (AKA pairwise) is a well known and easy to understand test design strategy. Even so, it is worth emphasizing that Hexawise lets users create pairwise sets of tests that will test not only every pair but it also allows test designers to generate far more thorough sets of tests (3-way to 6-way). This allows users to "turn up the coverage dial" and generate tests that cover every single possible triplet of test inputs together at least once for example (or every 4-way combination or 5-way combination or 6-way combination).