This is a personal blog. My other stuff: book | home page | Twitter | CNC robotics | electronics

June 18, 2010

HTTPS is not a very good privacy tool

Today, EFF announced HTTPS Everywhere - a browser plugin that automatically "upgrades" all requests to a set of predefined websites, such as Wikipedia, to HTTPS. This is done in a manner similar to Strict Transport Security.

Widespread adoption of encryption should be praised - but the privacy benefits of tools like this are often misunderstood. The protocol is engineered to maintain the confidentiality and integrity of a priori private data exchanged over the wire - and does very little to keep your actions private when accessing public content.

Even with HTTPS, every passive, unsophisticated attacker should be able to exactly tell which Wikipedia page you happen to be interested in: looking at packet sizes, direction, and timing patterns for encrypted HTTP requests, he can identify the resource with a high degree of confidence. With that particular site, you do not even need to crawl the content on your own: database dumps are provided by the foundation, and take a couple of hours to download over DSL.

Adding some random padding and jitter to the communications will help, but can be only taken so far without introducing a very significant performance penalty. Because of this, large-scale behavioral analysis is still likely to be very effective even if we do some of that.

Naturally, there are situations where HTTPS actually helps with privacy; but fewer than we probably come to expect. Even the contents of encrypted text typed in by the user can be reconstructed in some fascinating cases, as explored in this research paper from Microsoft.

4 comments:

  1. Hooray for traffic analysis!

    I unwittingly reinvented Chaumian mix-nets in a paper[1] I presented at PET 2002. Amusingly, David Chaum was the keynote speaker and I learned a lot about mix-nets. A week later a friend and I reinvented onion routing.

    [1] http://guh.nu/projects/ta/safeweb/

    ReplyDelete
  2. We live in a non-ideal world where incremental improvements are the only reliable way we know of to improve our situation. HTTPS Everywhere is not meant to be a panacea. It is meant to provide opportunistic improvements where possible.

    Furthermore, transforming a database dump of wikipedia into reliable fingerprints is not quite as straight forward as you indicate. There's a lot of variability that comes from how the actual server supports caching, pipelining, and other HTTP properties that alter communication style. The cross product of how particular browser and browser versions behave also needs to be taken into account.

    Just because academia publishes a paper saying a technique is possible on a reduced sample size does not mean it trivially scales to hundreds of thousands of pages in a general and practically deployable fashion.

    Furthermore, fully obscuring the endpoint is not the job of the application layer. This is why people have built alternate network/transport layers like Tor.

    ReplyDelete
  3. Mike: my point isn't that HTTPS needs to be perfect before it can be used for this purpose; it's simply that HTTPS is not meant to offer *any* significant privacy when accessing certain types of public, indexable context - Wikipedia being a prime example.

    This is exactly the point you seem to be making in the end: to ensure privacy of access to public data, a wholly different tool, such as Tor, would be far more appropriate.

    As to the technical bits - yes, on subsequent visits, there are some variations related to content caching; but I seriously doubt theis poses a significant problem. It's your gut feeling versus my gut feeling, though, so no point in arguing :-)

    ReplyDelete
  4. Yeah, I guess we're splitting hairs here, and are really both basically saying the same thing.

    However, I do feel the need to provide a explanation as to why my gut says that fingerprinting is not feasible on a wide scale. It basically comes down to two reasons:

    1. https://secure.wikimedia.org/wikipedia/en/wiki/Base_rate_fallacy

    This is a very common problem with classification and recognition work. Much of the literature only focuses on recognition of very small target sets from within very small event rates, and claims misleadingly high accuracy as a result.

    If you consider the considerably higher event rate of a large encrypted corporate network, or the HTTPS activity of all browser+platform combinations to all web pages, it's easy to see that even very high levels of fingerprinting accuracy can still lead to significant amounts of incorrect data being generated in cases of dragnet surveillance.

    This leads me to my second point:

    2. Computer Science academia is pretty dysfunctional

    The financial and scholarly pressure in CS academia has created tremendous bias against proper application of the scientific method and reproducibility of results. Key implementation details are often omitted, and source code to properly verify and reproduce experiments is almost never provided. Hence, we often can't easily determine if basic oversights like the base rate fallacy were properly accounted for, nor can we easily test for simple things like changes in event rate or sample size.

    In other sciences, it is traditional to provide extremely detailed supplemental materials recording every step of an experiment down to the smallest detail for purposes of accurate reproducibility. Not so with computer science...


    Now normally, I wouldn't even think this would be worth discussing, because it would be great if this work were used to encourage better design decisions for encrypted protocols (like SPDY, whose request packaging could possibly mitigate a lot of the chatter that provides so much of the ambiguity set reduction for that and other papers).

    However, more often than improvements, I see attacks like these held up as excuses for laziness with reasoning like "We shouldn't even bother to encrypt anything. The adversary can figure out what's going on anyway." This reasoning is not only both common and dangerous, but I believe it's also flawed, for the reasons I detailed above.

    But, perhaps I'm just a cynic :)

    ReplyDelete