Software Quality

January 5, 2010

Do you trust your tests to uncover defects in your product code?

Filed under: Testing — David Allen @ 11:02 pm

How do you know whether your tests are effective? Do you trust them to detect and report defects in your product code? Product code can have defects, but test code can also have defects.  This article gives a detailed explanation of how defects can be hidden and how this can lead us to a false sense of security.    We can minimize this risk by being skeptical and attentive at all times to ensure that our tests are correct. I hope this article helps you better understand the problem in depth so you can recognize the risk and prevent problems.

Scope of Discussion

We develop software for use in a production environment.  Regardless of whether this code is temporarily in a pre-production environment on its way to being deployed, or has already been deployed, we will call this code the product code or product to differentiate it from other kinds of code. Other kinds of code include test code and deployment code. These types of code are similar in some ways and different in others. This article will compare and contrast product code and the test code that supports it.  We presume that the developers are using some method that involves automated testing at some point.  This comparison alone is a fun exercise. But later, we will build on that, and use the insights gained to improve our software methods to achieve higher quality than we might otherwise achieve. Our goal is to improve software quality by improving correctness. Our journey is to better understand and engineer the automated test process by which we judge correctness.

Correctness is relative to something

The correctness of code can only be described relative to some standard. The standard is often called “requirements”.  In Test-Driven Development, user requirements are translated into automated tests (code) which are intended to accurately express the requirements in scenarios so that the product coded can be evaluated for correctness.  So we have

  • Requirements
    • These are expressions of the needs, wants, and priorities of the customer. They come from humans, living in a changing world. As such,  they are imperfect and ever-changing. This makes it challenging to capture them long enough to use as a basis for constructing software, which is a slow and tedious process. Requirements are often expressed as user stories, use cases, scenarios, feedback on working software, defect reports, requirements documents,  etc….)
  • Product code
    • built to meet the requirements
  • Test code
    • built to help express the requirements in a form that can contribute to the validation or rejection of product code.
  • Deployment code
    • Its ultimate purpose is to help us reliably place product code into the intended production environment.  After all, the product code is not useful until it is successfully deployed into a production environment. Deployment code is often tested as part of an overall development process in which copies of the product are deployed to pre-production environments where the product is then tested. Deployment code has similarities with test code. They are both intermediate tools, intended to product a useful and reliable product. Deployment code is
      also similar to product code:  they must both be tested (typically by a combination of manual and automated methods), to verify that they are performing correctly.

Have you ever thought about the variety of possible states of the items above?

  • Requirements can be
    • correct or incorrect (as judged by the system specifiers, users, or owners)
    • clear or ambiguous as understood by the customers who express them and the builders who must design from them
    • complete or incomplete relative to a scope of work
    • inconsistent (one requirement is in conflict with one or more different requirements)
    • robust or fragile (abstract and fundamental allows for adaptive behavior but leads to ambiguity, precision reduces ambiguity but reduces relevance over time and in the face of change)
  • Product code can be
    • correct or incorrect as measured against requirements
    • complete or incomplete as measured against a scope of work
    • robust or fragile
  • Test code can
    • correctly or incorrectly express the requirements
    • be complete or incomplete as measured against a scope of usage defined by scenarios and quality standards
    • robust or fragile

Combinations of states

This is not a complete or perfect list. There are other possible states, but these are some of interest for our purpose. Our purpose is to understand our reality in a way that gives us insight to reduce defects and produce a high quality product. How many possible combinations of the states above might actually exist in our reality?  We have one factor, requirement, that has 5 binary (two-valued) factors.  Product and test code have 3 binary factors each. That’s a total of 5+3+3 = 11 binary factors.  That gives us 2^11 = 2048 combinations. Shall we examine each? I would rather not.  Many of these are independent of one another and can be considered one at a time. But there is one combination where the interplay is interesting to examine. Let’s examine a simplified model of software, and focus on these 3 factors:

  • Product code can be
    • correct or incorrect as measured against requirements
  • Test code can
    • correctly or incorrectly express the requirements
  • Test Results can be
    • pass or fail

Now we only have 3 binary factors. 2^3 = 8 combinations. Much nicer. And you will see some synergy from examining these together. In the context of a testing process, the product code is the input, the test code is the process, the test result is the output. So we have an Input, Process, Output (IPO) model.

In the table below 0=false and 1=true.

Case# ProductCode Test Code Test Results Predictive Quality Test Effectiveness Risk Notes
Correct Correct Pass        
1 0 0 0 Lucky Fail Ineffective Medium This is a mess. At least  you  have a failing test to catch your
attention. Unfortunately, there is a lot of work to do. Everything is incorrect. 
2 0 0 1 False Pass Ineffective High This is a mess, and there is no failure to catch your attention.
3 0 1 0 True Fail Effective Low True Pass and True Fail means the test is effective and correctly reporting the
correctness of the target product code. A good place to be. The test is good,
and shows the code is bad. This might be our state at the beginning of a TDD
process. Fix the code, the test will pass, and you will be happy.
4 0 1 1 Impossible X X not possible. By definition, a correct test does not create a false result
5 1 0 0 False Fail Ineffective Medium Unfortunate. But no big deal.  We see the failure, investigate, and realize
it is the test, not the product code, that is defective. This requires good
customers, analysts, or requirements documents to ensure that the programmer
does not mistakenly think the test is correct, and alter the product code.
6 1 0 1 Lucky Pass Ineffective High Product is fine. However, the test is ineffective. It is not testing correctly.
But it is accidentally passing anyway. If the code were to change, the test might
not catch the defect, producing a “False Pass” like case 2, which is very bad.
7 1 1 0 Impossible X X similar to case 4
8 1 1 1 True Pass Effective Low True Pass and True Fail means the test is effective and correctly reporting the
correctness of the target product code.  In this case, the product is
correct and our test reports that.

Analysis and Application

 Let’s summarize the test scenarios by predictive qualities.

  • Effective tests
    • True Pass
    • True Fail
  • Ineffective tests
    • False Pass
    • False Fail
    • Lucky Pass
    • Lucky Fail

Effective tests are what we desire. When they produce failing test results, at least we know to find the product defect and repair it. “True” tests may pass or fail. But they are truly reporting the correctness of the target product code.

Ineffective tests are dangerous. “False” tests report a state of correctness about the product that is opposite from the truth. “Lucky” tests have a test result that correctly matches the correctness of the product, but that is only a coincidence. In these cases, the test is not really effective at measuring product correctness. We just got “lucky” if you can call it that.

The ineffective tests that pass are rated a high risk. The risk is high because the passing status give us no reason to examine the test or code. We are busy fixing failing tests, and we are unlikely to look closely at these. At some point in the future, we may change the product code and break its behavior as measured against requirements. But since we trust the test to express those requirements, and the test is not effective, our trust is misplaced, and the change may introduce a defect that is not detected.

The ineffective tests that fail are rated medium risk, which is less of a risk than the ineffective passing tests. The reason for the lower relative risk is that a failing test calls attention to itself and begs for scrutiny. As we examine the failing test, if we are attentive, we may notice the defects in the test and improve the quality during our review and repair process.

How can we use this information to our advantage to improve the product? The false and lucky results motivate us to be careful and skeptical. Just because a test result fails does not mean the code is defective. It could be the test. Just because a test passes, does not guarantee the code is working well. It could be broken code, masked by an equally broken test.

And also remember that other internal qualities, such as the ones listed in the earlier, more comprehensive list, are still important risk factors that require attention. These are things like robustness and completeness. We should be concerned about those qualities as well, and interested in whether our tests and products have the desired qualities.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: