Who watches the watchmen? Are your tests tested?

Who watches the watchmen?
Are your tests tested?

As the saying goes in Latin, “Quis custodiet ipsos custodes.” It roughly translates to - Who watches the watchmen? This paradox is also aptly applicable to your testing strategy and is potentially one of the biggest reasons for the failure of testing approaches in companies.


In this blog, I will try to explore the paradox a little more, dig a bit deeper into why this happens and brainstorm around the remedial steps.

The paradox

What the developers develop is tested by the testers. But what do the testers do? Who checks that?

In most organizations, no one does that. In some organizations, there are peer reviews or manager reviews done for sample test cases. If a test case wrongly identifies a defect, the developers raise a protest, and the tester corrects the test case/ test script. However, if the tester fails to look at a defect, the defect passes through the system.

A multi-layer testing strategy is usually followed. Eg. Unit testing, integration, system, smoke, sanity, etc. But if the same or similar team is generating these test cases, what is missed will likely be missed from other layers of testing also. So while multi-layer testing is needed, it does not go too far in resolving the paradox we are discussing.

Mistakes of the testers are often not captured.

This allows the defect to pass through to the production environment. And all of us know that fixing a defect in production is a lot more expensive than fixing it in earlier stages.

The problem can be more significant than that because how do you detect a defect in production? Usually, your users will not report a defect, they will take the easier path of avoiding the product itself, and the defect gets to sit in the production environment for a long time, adding to the damage silently.

You would wonder, why do we not fix that if it is such a big problem? The reason is hidden in its recursive argument. Because, if we have reviewers for the testers, someone might question why not the reviewer for the review and so on.

Of course, one has to stop somewhere.

And hence the paradox.

How to address the problem

To ensure that the quality control exercises themselves are compliant with the quality levels, by using the following steps;

Institutionalize a review process with proof

Treat each test case as a work product that cannot be approved without proof and testing. So the author of the test case is different from the person who executes it.

The executor has the right and duty to log defects in the scripts under the testing project. These defects are tracked to closure. The defects in test scripts must be treated as formal defects in the project, but at the same time, they should not be mixed with the defects logged against the product as they are not product defects and will corrupt the defect database of the product.

Do a reverse review of randomly selected scripts

Further, to strengthen the process; there should be a reverse review, i.e., for a sample set of test cases, the developer should review the test cases to certify coverage of the features. This step helps in a few ways:

  1. It breaks the paradox and closes the loop with the developers rather than finding the watchman for the watchman.
  2. Developers understand their code best and are the best people to certify the completeness of the test cases.
  3. At the same time, we should not make this a practice across the board, as this will take away too much of the developer’s time and, to some extent, will take away the value the independent QA brings to the table. But keeping it to a random sample will statistically help us establish the overall quality of coverage.

There are different types of coverage like functional coverage, test coverage, code coverage, requirement coverage, etc. The metrics should be defined and tracked closely. This helps find gaps in defining the test cases and establishes whether the test scripts address all the conditions and areas of testing.

Another critical step in checking for coverage is to build and maintain a detailed requirement traceability matrix (RTM). This helps track if all aspects of the requirements are being tested and if no untested requirements are being released.

Let statistics come to the rescue

Data is your best friend when it comes to monitoring the success of your testing strategy. As is said “Data does not lie” and hence it can be the most authentic monitoring tool for the effectiveness of your tests.

Some of the key metrics that you can track and how it helps know if your testing is working are as follows:

  1. Defect detection rate (DDR): This is the rate at which defects are being injected into the module in a given stage of the development (like design review, coding review, unit testing, integration testing, etc.).

    To measure this metric, first, you need to agree on the metric that measures the size of your software. There are many options like story points, function points, lines of code, Simple/Medium/Complex (SMC) rating of each individual module, etc. Once the sizing metric is finalized, you can measure the defect detection rate as:
  2. Defect Detection Rate = No of defects identified in this stageSize of the software module
  3. Defect Injection rate (DIR) - This slightly trickier metric tells you at which stage the detected defect was injected. This is trickier because it is difficult to know this deterministically. Also, it requires some time to determine and needs collaboration from the QA and development team. But if a discipline is established to do this, this helps correlate the detected defects to injected defects.

    If the stage difference is higher between detection and injection, it means that the testing stages could not catch the defect and throws direct light on the testing performance in that stage. If this number is more significant than the threshold established for your project, it is a “check engine light on” for your testing strategy and requires investigation and corrective action.

  4. Similar to the defect metrics as given above which is scaled down to the module size, you can also use a lot of other metrics that help draw insights on how your testing module is performing.

    Deviation from the established norms (even positive deviations) calls for a reason to investigate and isolate any causes that are affecting the efficacy of your testing.
  5. These metrics are:
    Total number of test cases per software size
    Total number of test data per software size
    The total number of first time success test cases per software size
    The total number of failed test cases per software size by testing cycle count
    The total number of accepted/ rejected defects per software size

The point to note here is that negative deviations often are acted upon, e.g., the manager finds that the defect injected for a module is higher than the defects injected norm.

However, please note that most likely, the deviation of this kind will lead you to coding inefficiencies (which are very important, of course). But the deviation on the positive size, like defect injected for a module is lower than the established norm. While this may simply indicate that the development team did a great job, on the other hand, it may indicate that the testing team underperformed and found lesser defects that they should have.


Introduce easter egg defects to check the quality of testing

Like the testers test if the code is working as per specifications or not; the developers can test if the testing is working as per expectations or not. This is done by injecting “easter egg defects.” These are intentionally injected defects in the software to see if the testing team can identify those.

Of course, the easter egg defects are written in a way that they can be disabled easily and neatly. This is one of the most expensive and riskiest approaches to detecting the quality of testing, but this is the most effective too.


As we discussed before, there are two types of problems in the test’s quality:

  1. Wrong test definition or inappropriate test data
  2. Inadequate scenario coverage or ineffective test data

The first type of errors can lead to missed defects or extraneous defects. Extraneous defects are easily detected because the development team protests about them using quite vehemently. As a metric, defects rejected per software size per stage of software development aggregate this data and can be monitored effectively.

The missed defects or the defects missed because of the second problem are difficult to detect. Strategies like data monitoring of defect injection rates, defect detection rates, review of test cases and data, and easter egg defects can help indicate those defects.

As you mature the process using these means, you get control over missed defects and can effectively reduce those hence enhancing the overall effectiveness of your test strategy.

To streamline your software testing process and make it more efficient, Wishup offers software testers proficient in automation testing. To know more about them, send us an email at [email protected] or visit our website Wishup.co