This is a personal blog. My other stuff: book | home page | Twitter | prepping | CNC robotics | electronics

July 21, 2010

"Testing takes time"

When explaining why it is not possible to meet a particular vulnerability response deadline, most software vendors inevitably fall back to a very simple and compelling argument: testing takes time.

For what it's worth, I have dealt with a fair number of vulnerabilities on both sides of the fence - and I tend to be skeptical of such claims: while exceptions do happen, many of the disappointing response times appeared to stem from trouble allocating resources to identify and fix the problem, and had very little to do with testing the final patch. My personal experiences are necessarily limited, however - so for the sake of this argument, let's take the claim at its face value.

To get to the root of the problem, it is important to understand that software quality assurance is an imperfect tool. Faulty code is not written to intentionally cripple the product; it's a completely unintended and unanticipated consequence of one's work. The same human failings that prevent developers from immediately noticing all the potential side effects of their code, also put limits of what's possible in QA: there is no way to reliably predict what will go wrong with modern, incredibly complex software. You have to guess in the dark.

Because of this, most corporations simply learn to err on the side of caution: settle on a maximum realistically acceptable delay between code freeze and a release (one that still keeps you competitive!) - and then structure the QA work to be compatible with this plan. There is nothing special about this equilibrium: given resources, there is always much more to be tested; and conversely, many of the current steps could probably be abandoned without affecting the quality of the product. It's just that going in that first direction is not commercially viable - and going in the other just intuitively feels wrong.

Once a particular organization has such an QA process in place, it is tempting to treat critical security problems similar to feature enhancements: there is a clear downside to angering customers with a broken fix; on the other hand, and as long as vulnerability researchers can be persuaded to engage in long-term bug secrecy, there is seemingly no benefit in trying to get this class of patches out the door more quickly than the rest.

This argument overlooks a crucial point, however: vulnerabilities are obviously not created by the researchers who spot them; they are already in the code, and tend to be rediscovered by unrelated parties, often at roughly the same time. Hard numbers are impossible to arrive at, but based on my experience, I expect a sizable fraction of current privately reported vulnerabilities (some of them known to vendors for more than a year!) to available independently to multiple actors - and the longer these bugs allowed to persist, the more pronounced this problem is bound to become.

If this is true, then secret vulnerabilities pose a definite and extremely significant threat to the IT ecosystem. In many cases, this risk is far greater than the speculative (and never fully eliminated) risk of occasional patch-induced breakage; particularly when one happens to be a high-profile target.

Vendors often frame the dilemma the following way:

"Let's say there might be an unspecified vulnerability in one of our products.

Would you rather allow us to release a reliable fix for this flaw at some point in the future; or rush out something potentially broken?"

Very few large customers will vote in favor of dealing with a disruptive patch - IT departments hate uncertainty and fire drills; but I am willing to argue that a more honest way to frame the problem would be:

"A vulnerability in our code allows your machine to be compromised by others; there is no widespread exploitation, but targeted attacks are a tangible risk to some of you. Since the details are secret, your ability to detect or work around the flaw is practically zero.

Do you prefer to live with this vulnerability for half a year, or would you rather install a patch that stands an (individually low) chance of breaking something you depend on? In the latter case, the burden of testing rests with you.

Or, if you are uncomfortable with the choice, would you be inclined to pay a bit more for our products, so that we can double our QA headcount instead?"

The answer to that second set of questions is much less obvious - and more relevant to the problem at hand; depriving the majority of your customers of this choice, and then effectively working to conceal this fact, just does not feel right.

Yes, quality assurance is hard. It can also be expensive to better parallelize or improve automation in day-to-day QA work; and it is certainly disruptive to revise the way one releases and supports products (heck, some vendors still prefer to target security fixes for the next major version of their application, simply because that's what their customers are used to). It is also likely that if you make any such profound changes, something will eventually go wrong. None of these facts makes the problem go away, though.

Indefinite bug secrecy hurts us all by removing all real incentives for improvement, and giving very little real security in return.


  1. Are we just arguing about likelihoods here?

    1. Likelihood any given bug is known by more than one person.

    2. Likelihood that the bug will get exploited before a patch is pushed.

    3. Your individual likelihood of being the one #2 happens to.

    I don't know how this doesn't depend a lot on the actual likelihood of #1 for any given bug... and I suppose how likely it is to get exploited.

    Essentially you are trading some likelihood of getting owned for likelihood of breaking things. Perhaps something less subjective would help us all make sense of this?

  2. Possibly - but most of the attempts to quantify security problems fall short.

    My primary concern is mostly that bug secrecy takes away the ability to make decisions about these probabilities away from customers - and leaves it squarely with the vendor, who is driven by a complex set of incentives not necessarily shared by the public. Should a single critical customer be able to dictate when others get a fix? Should the vendor be trusted to make the right call on how much money to throw on QA, and how many risks to take, in absence of any external incentive to try harder? Most vendors really are good people - but these are still interesting questions, and I think there is far more to the debate than most users realize (and certainly more than PR depts make out of it).

  3. The bug I have in mind that I think falls into an interesting grey-zone is MS07-017. I think this one is especially interesting because Microsoft developed the fix fairly quickly, but broke people running certain realtek gear, without having any idea it was going to happen. Neither Microsoft nor the customers could have had any idea this would occur, so meaningful risk assessment was quite hard.

    I agree that every bug is different, but I still stand by my previous statement. There needs to be some reasonable estimate of likelihood the patch will cause hard itself vs. the likelihood of an attack. Absent meaningful data, we're still just guessing for any given bug.

    I'm still looking for good data (not just anecdotes) on bug rediscovery rates.

  4. I don't think you can expect good data here, because it does not seem to be possible to collect it.

    You could survey researchers, but their own statements are probably not reliable, and will at least partly reflect their personal beliefs.

    You could get the data from vendors, examining the number of duped security bugs; but that would almost certainly underestimate the problem, especially when it comes to fully disclosed bugs.

    Lastly, you could do some vulnerability research and see how often this happens to you; but then you become a part of the anecdote ;-)

  5. I think you're loading your argument almost as much as the vendors do. :-)

    I admit I can't quantify it, but I suspect that the risk of a quickly-released security update breaking something isn't nearly as remote as you suggest. Similarly, I think you're significantly overstating the risk of any particular independently discovered vulnerability being used in targeted attacks.

    Instead of "there is no widespread exploitation, but targeted attacks are an appreciable risk" I would say "there is no evidence that the vulnerability is being used maliciously, but we cannot rule out the possibility".

  6. Sadly, my understanding is that targeted attacks using non-public vulnerabilities are a reality every month.

    Naturally, getting objective data on this is rather unlikely; but there seems to be no compelling argument to dismiss this as improbable.

    In the end, as noted earlier, it comes down to giving users choice, and the ability to test patches and make the call on their own; and giving the security community a chance to respond.

  7. Michal: considering the large number of vulnerabilities any given piece of software contains, it seems to me that the odds of any two individuals discovering the *same* one within any reasonably short time-frame (say, four months) are probably quite low. It has been known to happen, of course, but I don't believe it's common.

    Whether non-public vulnerabilities are being used in targeted attacks is not at issue. The question is whether the vulnerabilities being used are ones that have already been confidentially reported to the vendor.

    As for objective data: won't statistical inference do the job? We know how many vulnerabilities have been discovered so far in any given piece of software, and, in most cases, approximately when they were discovered. We can also probably make a reasonable estimate of the number of active researchers over any given period. Using Bayesian statistics, it should be possible to use this data to estimate the odds of multiple independent discoveries of a single vulnerability within any specified time-frame.

  8. Incidentally, let's not forget another good reason for a (relatively small) additional delay in releasing a patch: regular update schedules. People make fun of Patch Tuesday, but it's a heck of an improvement over the way things used to be done.

  9. My experience with vulnerability rediscovery is exactly the opposite (and I covered this in more detail in the earlier, linked post: Several other researchers (Alex Sotirov, etc) also agree.

    Also, I honestly don't see any benefit of regular patch schedules, other than enabling complacency - by minimizing the apparent disruption of security updates while prolonging exposure.

  10. FYI, I revised the post a bit based on the conversations with several people. It probably makes the point more clearly now, although I suspect we just have to agree to disagree on this one :-)

  11. I hadn't read the earlier post - missed the link, sorry - and it is interesting. I'm not entirely convinced, because it still seems to me that we'd see far more 0-days if the black hats really did already know all the vulnerabilities the white hats are just now discovering.

    As to regular patch schedules: speaking from a practical point of view, having to reboot really does disrupt the typical IT worker's productivity, and I know from personally experience that the process of updating servers (including the hassle of scheduling outages) takes up time and energy I could be using more productively. These are real costs, which regular patch schedules really do help to mitigate.

    Also, at least in our environment, a predictable schedule for updates makes it possible to get them installed (at least on the servers) more quickly after release than would otherwise be possible. Since the risk ramps up enormously after an update is released I think this is a significant factor. (Regardless of whether or not it is likely that some of the bad guys already knew about the vulnerability, what is certain is that *after* release *all* of them do, and they no longer have any motivation to avoid wide scale attacks, either!)

    It also seems to me - and I have to admit that this one is a strictly subjective impression - that now that staff and grads here have gotten used to rebooting their machines once a month the typical delay before they do so is much shorter than it used to be. If they think they'll just be asked to do it again tomorrow or the next day, they're far less likely to cooperate. The same probably holds true for home users.

  12. Fair enough - and for what it's worth the revised version does seem more balanced to me!

    High-profile targets are certainly at more risk from undisclosed vulnerabilities than most of the rest of us, and (in most cases) are better able to take action to mitigate the effects if a patch isn't available. I'd guess that most high-profile targets are probably also better able to bear the costs of dealing with frequent updates, and to cope with the risks of unexpected side-effects.

    The problem from my point of view is that most of us aren't high-profile targets - and I'm not sure it's socially responsible to promote policies which benefit the few at the expense of the many. :-)

  13. As to patch cycles: looks like the root of the problem is the need to reboot, then. There is no really good need to require reboots every time you update Java, Flash, or the browser. We should try to change this instead...

  14. It would certainly be nice to eliminate the need to reboot, but this isn't easy. Rather than go into it in depth here, I've discussed it on my own blog here.