October 17, 2008

Increasing Code Coverage May Be Harmful

I should start by clarifying: I'm referring to increasing code coverage by running a code coverage tool, looking for untested code, and writing tests for it. Here's an example.

Let's say you run your test coverage report and you find the following method is not tested.

class BankAccount
  def masked_number
    ("*" * 8) + number.to_s[-4, 4]
  end  
end

It's easy to see what this method does. I can get 100% coverage in a few seconds.

describe "masked_number" do
  it "returns 8 asterisks followed by the
        last 4 characters of the account number" do
    account = BankAccount.new :number => 123456781234
    account.masked_number.should == "********1234"
  end
end

Is the method tested? Yes. Is it tested well? No. What is it supposed to return if there is no account number? What if the account number is longer than 12 digits - should it still return 8 asterisks, or should it return more? Because masked_number didn't have any tests, you have no idea if the author of the method considered these scenarios or not.

So let's say we assume that the original author of this method considered these possibilities and did the right thing. So we add a few more tests.

describe "masked_number" do
  it "returns 8 asterisks if the account number is blank" do
    account = BankAccount.new :number => nil
    account.masked_number.should == "********"
  end

  it "returns 8 asterisks followed by the
        last 4 characters of the number for a
        14 digit account number" do
    account = BankAccount.new :number => 12345678901234
    account.masked_number.should == "********1234"
  end
end

We now have 3 tests for this method, covering a few possible input values. Awesome. Maybe...

Now let's say a developer picks up a bug ticket from QA: "For an account number of length X, the account mask should show X - 4 asterisks and the last 4 digits of the account number. Right now it always shows 8 asterisks. For example, a 14 digit account number should have 10 asterisks."

So the developer decides to do TDD and write a test, but he or she notices the test that's already there. Now the quick thing to do would be to assume that the QA ticket is right and update the existing test. But that's not the responsible thing to do. If a test is there, indicating that a 14 digit account number should have 8 asterisks, somebody must have desired that behavior. Before changing the test and the code, the developer should find out who wrote the test. Does somebody else think it should be 8 asterisks? Does one part of the application always want 8, while another wants 10?

When faced with untested code, there are 3 approaches you can take, and all of them have drawbacks.

(1) Thoroughly test the method.
Advantage: the method is now well tested.
Disadvantage: it may lead future developers astray by making something coincidental seem intentional.

(2) Do the minimal amount of testing to cover the method.
Advantage: the method has some basic tests for it.
Disadvantage: you're focused on the wrong goal. You should strive for well tested code, not 100% code coverage. There's a big difference.

(3) Don't test it until you can verify the requirements. Then thoroughly test it.
Advantage: avoiding the disadvantages of options 1 and 2.
Disadvantage: verifying the requirements can be time consuming, and until you do, you have untested code.

If the requirements aren't clear, in my opinion, it's a tough choice. The best option is to avoid all 3 by being disciplined with testing. But honestly, I'd lean towards not testing the method. If a developer comes across a bug in the untested method, then he/she should write a test for that bug. Working on the bug also provides a good opportunity to ask questions about the requirements. If a developer needs to change the method due to a new requirement, then he/she should write a test for the change. If a developer wants to refactor the method and wants to be sure that his refactoring doesn't break anything, he might want to reconsider. Yes, poor test coverage inhibits refactoring.

Essentially, what I'm saying is that trying to increase test coverage by looking for untested code and writing tests for it may be harmful due to the disadvantages listed under approaches #1 and #2.

Keep in mind that tests can never guarantee correct requirements were implemented. But if you see a test that says 1 + 1 should be 3, you shouldn't change it until you find out more about why it's that way. Similarly, 100% coverage never guarantees that the code is well tested. But doing the minimum amount of effort to cover code makes the team less confident in the tests. I'd rather have 90% coverage, and feel that the 90% is a solid, well tested, 90%, than have 100% but feel that most of the code isn't well tested.