This is a personal blog. My other stuff: book | home page | Twitter | prepping | CNC robotics | electronics

October 28, 2010

HTTP cookies, or how not to design protocols

For as long as I remember, HTTP cookies have been vilified as a grave threat to the privacy of online browsing; wrongly so. That said, the mechanism itself is a very interesting cautionary tale for security engineers - and that will be the theme of today's feature.

Cookies were devised by Lou Montulli, a Netscape engineer, somewhere in 1994. Lou outlined his original design in a minimalistic, four-page proposal posted on netscape.com; based on that specification, the implementation shipped in their browser several months later - and other vendors were quick to follow.

It wasn't until 1997 that the first reasonably detailed specification of the mechanism has been attempted: RFC 2109. The document captured some of the status quo - but confusingly, also tried to tweak the design, an effort that proved to be completely unsuccessful; for example, contrary to what is implied by this RFC, most browsers do not support multiple comma-delimited NAME=VALUE pairs in a single Set-Cookie header; do not recognize quoted-string cookie values; and do not use max-age to determine cookie lifetime.

Three years later, another, somewhat better structured effort to redesign cookies - RFC 2965 - proved to be equally futile. Meanwhile, browser vendors tweaked or extended the scheme in their own ways: for example, around 2002, Microsoft unilaterally proposed httponly cookies as a security mechanism to slightly mitigate the impact of cross-site scripting flaws - a concept quickly, if prematurely, embraced by the security community.

All these moves led to a very interesting situation: there is simply no accurate, official account of cookie behavior in modern browsers; the two relevant RFCs, often cited by people arguing on the Internet, are completely out of touch with reality. This forces developers to discover compatible behaviors by trial and error - and makes it an exciting gamble to build security systems around cookies in the first place.

In any case - well-documented or not, cookies emerged as the canonical solution to an increasingly pressing problem of session management; and as web applications have grown more complex and more sensitive, the humble cookie caught the world by storm. With it, came a flurry of fascinating security flaws.

They have Internet over there, too?

Perhaps the most striking issue - and an early sign of trouble - is the problem of domain scoping.

Unlike the more pragmatic approach employed for JavaScript DOM access, cookies can be set for any domain of which the setter is a member - say, foo.example.com is meant to be able to set a cookie for *.example.com. On the other hand, allowing example1.com to set cookies for example2.com is clearly undesirable, as it allows a variety of sneaky attacks: denial of service at best, and altering site preferences, modifying carts, or stealing personal data at worst.

To that effect, the RFC provided this elegant but blissfully naive advice:

"Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us". Any domain that fails within one of the seven special top level domains listed below only require two periods. Any other domain requires at least three. The seven special top level domains are: "COM", "EDU", "NET", "ORG", "GOV", "MIL", and "INT".

Regrettably, there are at least three glaring problems with this scheme - two of which should have been obvious right away:

  1. Some country-level registrars indeed mirror the top-level hierarchy (e.g. example.co.uk), in which case the three-period rule makes sense; but many others allow direct registrations (e.g., example.fr), or permit both approaches to coexist (say, example.jp and example.co.jp). In the end, the three-period rule managed to break cookies in a significant number of ccTLDs - and consequently, most implementations (Netscape included) largely disregarded the advice. Yup, that's right - as a result, you could set cookies for *.com.pl.

  2. The RFC missed the fact that websites are reachable by means other than their canonical DNS names; in particular, the rule permitted a website at http://1.2.3.4/ to set cookies for *.3.4, or a website at http://example.com.pl./ to set a cookie for *.com.pl.

  3. To add insult to injury, Internet Assigned Numbers Authority eventually decided to roll out a wide range of new top-level domains, such as .biz, .info, or .jobs - and is now attempting to allow arbitrary gTLD registrations. This last step promises to be a yet another nail to the coffin of sane cookie management implementations.
Net effect? All mainstream browsers had a history of embarrassing bugs in this area - and now ship with a giant, hairy, and frequently-updated lists of real-world "public suffix" domains for which cookies should not be set - as well as an array of checks to exclude non-FQDN, IPs, and pathological DNS notations of all sorts.

8K ought to be enough for for anybody

To make denial-of-service attacks a bit harder, it is well-understood that most web servers limit the size of a request they are willing to process; these limits are very modest - for example, Apache rejects request headers over 8 kB, while IIS draws the line at 16 kB. This is perfectly fine under normal operating conditions - but can be easily exceeded when the browser is attempting to construct a request with a lot of previously set cookies attached.

The specification neglected this possibility, offered no warning to implementators, and proposed no discovery and resolution algorithm. In fact, it mandated minimal jar size requirements well in excess of the limits enforced by HTTP servers:

"In general, user agents' cookie support should have no fixed limits. They should strive to store as many frequently-used cookies as possible. Furthermore, general-use user agents should provide each of the following minimum capabilities [...]:

* at least 300 cookies
* at least 4096 bytes per cookie (as measured by the size of the characters that comprise the cookie non-terminal in the syntax description of the Set-Cookie header)
* at least 20 cookies per unique host or domain name"

As should be apparent, the suggested minimum - 20 cookies of 4096 bytes each - allows HTTP request headers to balloon up to the 80 kB boundary.

Does this matter from the security perspective? At first sight, no - but this is only until you realize that there are quite a few popular sites that rely on user-name.example.com content compartmentalization; and that any malicious user can set top-level cookies to prevent the visitor from ever being able to access any *.example.com site again.

The only recourse domain owners have in this case is to request their site to be added to the aforementioned public suffix list; there are quite a few entries along these lines there already, including operaunite.com or appspot.com - but this approach obviously does not scale particularly well. The list is also not supported by all existing browsers, and not mandated in any way for new implementations.

"Oh, please. Nobody is actually going to depend on them."

In the RFC 2109 paragraph cited earlier, the specification pragmatically acknowledged that implementations will be forced to limit cookie jar sizes - and then, confusingly demanded that no fixed limits are put in place, yet specified minimum limits that should be obeyed by implementators.

What proved to be missing is any advice on a robust jar pruning algorithm, or even a brief discussion of the security considerations associated with this process; any implementation that enforces the recommended minimums - 300 cookies globally, 20 cookies per unique host name - is clearly vulnerable to a trivial denial-of-service attack: the attacker may use wildcard DNS entries (a.example.com, b.example.com, ...), or even just a couple of throw-away domains, to exhaust the global limit, and have all sensitive cookies purged - kicking the user out of any web applications he is currently logged into. Whoops.

It is worth noting that given proper warning, browser vendors would not find it significantly more complicated to structure the limits differently, enforce them on functional domain level, or implement pruning strategies other than FIFO (e.g., taking cookie use counters into account). Convincing them to make these changes now is more difficult.

While the ability to trash your cookie jar is perhaps not a big deal - or rather, the ability for sites to behave disruptively is also poorly mitigated on HTML or JavaScript level, making this a boring topic - the weakness has special consequences in certain contexts; see next section for more.

Be my very special cookie

Two special types of HTTP cookies are supported by all contemporary web browsers: secure, sent only on HTTPS navigation (protecting the cookie from being leaked to or interfered by rogue proxies); and httponly, exposed only to HTTP servers, but not visible to JavaScript (protecting the cookie against cross-site scripting flaws).

Although these ideas appear to be straightforward, the way they were specified implicitly allowed a number of unintended possibilities - all of which, predictably, plagued web browsers through the years. Consider the following questions:

  • Should JavaScript be able to set httponly cookies via document.cookie?

  • Should non-encrypted pages be able to set secure cookies?

  • Should browsers hide jar-stored httponly cookies from APIs offered to plugins such as Flash or Java?

  • Should browsers hide httponly Set-Cookie headers in server responses shared with XMLHttpRequest, Flash, or Java?

  • Should it be possible to drop httponly or secure cookies by overflowing the "plain" cookie jar in the same domain, then replace them with vanilla lookalikes?

  • Should it be possible to drop httponly or secure cookies by setting tons of httponly or secure in other domains?
All of this is formally permitted - and some of the aforementioned problems are prevalent to this day, and likely will not be fixed any time soon.

At first sight, the list may appear inconsequential - but these weaknesses have profound consequences for web application design in certain environments. One striking example is rolling out HTTPS-only services that are intended to withstand rogue, active attackers on open wireless networks: if secure cookies can be injected on easy-to-intercept HTTP pages, it suddenly gets a whole lot harder.

If it tastes good, who cares where it comes from?

Cookies diverge from JavaScript same-origin model in two fairly important and inexplicable ways:
  • domain= scoping is significantly more relaxed than SOP, paying no attention to protocol, port number, or exact host name. This undermines the SOP-derived security model in many compartmentalized applications that also use cookie authentication. The approach also makes it unclear how to handle document.cookie access from non-HTTP URLs - historically leading to quite a few fascinating browser bugs (set location.host while on a data: page and profit!).

  • path= scoping is considerably stricter than what's offered by SOP - and therefore, it is completely useless from the security standpoint. Web developers misled by this mechanism often mistakenly rely on it for security compartmentalization; heck, even reputable security consultants get it completely wrong.
On top of this somewhat odd scoping scheme, conflict resolution is essentially ignored in the specification; every cookie is identified by a name-domain-path tuple, allowing identically named but differently scoped cookies to coexist and apply to the same request - but the standard fails to provide servers with any metadata to assist in resolving such conflicts, and does not even mandate any particular ordering of such cookies.

This omission adds another interesting twist to the httponly and secure cookie cases; consider these two cookies:

Set on https://www.example.com/:
  FOO=legitimate_value; secure; domain=www.example.com; path=/

Set on http://www.example.com/:
  FOO=injected_over_http; domain=.example.com; path=/

The two cookies are considered distinct, so any browser-level mechanisms that limits attacker's ability to clobber secure cookies will not kick in. Instead, the server will at best receive both FOO values in a single Cookie header, their ordering dependent on the browser and essentially unpredictable (and at worst, the cookies will get clobbered - a problem in Internet Explorer). What next?

Character set murder mystery

HTTP/1.0 RFC technically allowed high-bit characters in HTTP headers without further qualification; HTTP/1.1 RFC later disallowed them. Neither of these documents provided any guidance on how such characters should be handled when encountered, though: rejected, transcoded to 7-bit, treated as ISO-8859-1, as UTF-8, or perhaps treated in some other way.

The specification for cookies further aggravated this problem, cryptically stating:

"If there is a need to place such data in the name or value, some encoding method such as URL style %XX encoding is recommended, though no encoding is defined or required."

There is an obvious problem with saying that you can use certain characters, but that their meaning is undefined; the systemic neglect of this topic has profound consequences in two common cases where user-controlled values frequently appear in HTTP headers: Content-Disposition is one (eventually "solved" with browser-specific escaping schemes); another is, of course, the Cookie header.

As can be expected, based on such poor advice, implementators ended up with the least sensible approach; for example, I have a two-year-old bug with Mozilla (418394): the problem is that Firefox has a tendency to mangle high-bit values in HTTP cookies, permitting cookie separators (";") to suddenly materialize in place of UTF-8 in the middle of an otherwise sanitized cookie value; this led to more than one web application vulnerability to date.

A session is forever

The last problem I want to mention in this post is far less pressing - but is also an interesting testament to the shortcomings of the original design.

For some reason, presumably due to privacy concerns, the specification decided to distinguish between session cookies, meant to be non-persistent; and cookies with a specified expiration date, which may persist across browser sessions, are stored on the disk, and may be subject to additional client-enforced restrictions. On the topic of the longevity of the former class of cookies, the RFC conveniently says:

"Each session is relatively short-lived."

Today, however, this is obviously not true, and the distinction feels misguided: with the emergence on portable computers with suspend functionality, and the increased shift toward web-oriented computing, users tend to keep browsers open for weeks or months at a time; session cookies may also be stored and then recovered across auto-updates or software crashes, allowing them to live almost indefinitely.

When session cookies routinely persist longer than many definite-expiry ones, and yet are used as a more secure and less privacy-invasive alternative, we obviously have a problem. We probably need to rethink the concept - and either ditch them altogether, or impose reasonable no-use time limits at which such cookies are evicted from the cookie jar.

Closing words

I find it quite captivating to see the number of subtle problems caused by such a simple and a seemingly harmless scheme. It is also depressing how poorly documented and fragile the design remains some 15 years later; and that the introduction of well-intentioned security mechanisms, such as httponly, only contributed to the misery. An IETF effort to document and clarify some of the security-critical aspects of the mechanism is underway only now - but it won't be able to fix them all.

Some of the telltale design patterns - rapid deployment of poorly specified features, or leaving essential security considerations as "out of scope" - are still prevalent today in the browser world, and can be seen in the ongoing HTML5 work. Hopefully, that's where the similarities will end.

October 14, 2010

Attack of the monster frames (a mini-retrospective)

Some 15 years ago, the introduction of HTML frames caused a significant uproar in the (still young) web development community. The outraged purists asserted that frames were bound to ruin everything: incompatible with many of the browsers and search engines of the old; bringing a significant potential to break navigation or printing; unfamiliar and confusing to users; and simply against the original vision supposedly laid out by the founding fathers of the web.

Today, these criticisms seem rather arbitrary: although framed navigation had its share of amusing missteps (not any worse than most other HTML features, I'd argue), the frames have become an important and unobtrusive part of the modern web, and a valuable content compartmentalization tool. But shockingly, even if for all the wrong reasons, the original detractors had one thing right: in a sense, they turned out to be our doom.

How so? Recall that framed browsing dates back to the days of the web being a simple tool for distributing static content - and in that context, the technology warranted no special consideration from the security community; but as our browsers morphed into de facto operating systems for increasingly complex, dynamic applications - well, we quickly discovered that the ability to selectively embed fully functional, third-party content on unrelated and potentially malicious websites is pretty bad news.

One of the earliest problems - with early reports dating back to at least 2004, and variants still being discovered several years later - is the realization that frames are implemented using essentially the same model as standalone windows; this model allows any website in possession of window name (or its DOM handle) to navigate it at will. This property is mostly harmless when dealing with proper windows equipped with an address bar - but is a disaster for seamlessly framed regions on trusted websites: if malicious-site.com can open trusted-application.com in a new window, and then navigate that application's frames to any other location - it can, essentially, silently hijack the UI.

Following this discovery, Adam Barth and others spent a fair amount of time proposing a better approach, and convincing several browser vendors to implement it; but even today, certain unavoidable weaknesses in this model prevail.

The next notable milestone: clickjacking - a seemingly obvious threat essentially ignored by the security community (perhaps in hope it disappears), until extravagantly publicized by Jeremiah Grossman and Robert 'RSnake' Hansen in 2008. The idea behind the attack is simple: if a frame containing trusted-application.com is placed on malicious-site.com, and then partly obscured or made transparent - the user can be easily tricked into thinking he is interacting with the UI of malicious-site.com - but end up sending the UI event to trusted-application.com, instead.

As the name implies, their analysis focused on mouse clicks - which in a sense, did the attack some disservice: the reporting led the community to assume that only certain exceedingly simple UI actions (such as the "like" buttons on social networking sites) could be realistically targeted - and that the attacker would still be facing difficulties computing the right alignment of visual elements for all targeted systems, browsers, and screen resolutions. But that's simply not true.

To demonstrate other perils of cross-domain frames, I posted a proof-of-concept exploit for an attack I jokingly dubbed strokejacking - showing that with the use of onkeydown events, selective keystroke redirection across domains can be used to perform very complex UI actions in the targeted application, far beyond what is possible with clickjacking alone. I also discussed reverse strokejacking - an even more depressing variant where evil embeddable gadgets on a targeted site are able silently intercept user input by playing with the focus() method. These reports received very little attention - but given the ridiculous name, that's perhaps for the best.

Since then, the situation with framed content has gotten even worse: not long ago, we witnessed this presentation from Paul Stone. Paul discussed drag-and-drop attacks on third-party frames: text selected in one obscured frame pointing to trusted-application.com could be unintentionally dragged and dropped into the area controlled by malicious-site.com - thus revealing the content across domains. Many researchers and browser vendors summarily dismissed this threat, on the grounds that the necessary interactions must complex and unusual - for example, triple-clicking or pressing Ctrl-A to select text - and therefore, that they are difficult to solicit; but this is incorrect.

What have we missed, then? Paul casually mentioned one special type of a common UI interaction we all frequently engage in on even the least interesting sites: using the scroll bar. Note that the act of grabbing the slider, dragging it down, and releasing it... is eerily similar to the act of selecting text, or dragging and dropping a selection across the page. The attack can be modified thus:

  1. Create a page with an article that spans more than a single screen - or has a TEXTAREA with an EULA that needs to be scrolled to the end before the "I agree" button is enabled, instead.

  2. Have a transparent IFRAME pointing to trusted-application.com that follows the mouse pointer.

  3. As soon as the user clicks the slider and holds the mouse button, reposition the frame up in relation to the cursor. This ensures that the entire framed text is selected, regardless of mouse movement (yes, this works!).

  4. Wait for mouse button to be released.

  5. Reposition the frame so that the next click will begin to drag the selection.

  6. While the user is interacting with the slider, move the frame away, and place a receiving TEXTAREA or contentEditable / designMode container under the mouse pointer.

  7. Steal documents across domains!
There are some technical challenges that make this a bit more complicated than advertised - but these can be worked around in a majority of the browsers on the market.

In the end, cross-domain frames proved to be a giant and completely unexpected attack surface; and very depressingly, we still have no idea how to properly address the problem once and for all. There simply are no simple and elegant solutions compatible with the modern web; and rest assured, browser vendors are extremely hesitant to experiment with complex heuristics instead. The only thing we decided to do to tackle the general threat is plastering the holes over with X-Frame-Options - a naive opt-in mechanism that allows websites to refuse being framed across domains. Alas, this mechanism will never be used by all the sites that actually need it - and it offers no protection in more complex cases, such as the increasingly prevalent embeddable gadgets.

The history of information security is littered with disturbingly similar cases of browser features colliding with each other, or being incompatible with the natural evolution of the web. If you need another example, just look at the profound problems caused by differences between same-origin policies for JavaScript, cookies, plugins (Java in particular) - and for peripheral browser features, such as password managers.

Because of this, I often fear that we are bound to repeat the painful security lessons of framed browsing very soon; for example, I am simply intimidated by the rush to deploy some of the more complex and at times exotic features as a part of HTML5 - web sockets, workers, sandboxing, storage, application caches, notifications, CORS, UMP, and countless other new HTML, CSS, and JS extensions added there every other week.

Yes, it's called "job security". But at times, it tends to suck.

PS. Yes, yes, I know. Interesting bugs coming soon. I have a very cool and major fuzzer waiting to be released, but I am still waiting for all vendors to fix the outstanding issues. Neat Firefox SOP bug is coming soon, too.

October 05, 2010

Off-topic: concise electronics for geeks

I always find it disappointing that so many of my extremely bright peers have only a vague idea about the inner workings of electronic circuits - and therefore, of modern computers. I think that all too often, this is a significant if underappreciated handicap.

The web today is chock-full of hobbyist guides - but most of them resort to gross oversimplification (such as the initially useful, but ultimately misleading hydraulic analogies), or are simply very inaccurate and incomplete. The remaining few websites I know of tend to resort to mundane, academic rigor - complete with differential equations and complex number algebra for transient analysis. These concepts are highly unlikely to be accessible, or even that useful, in hobbyist work.

I'm hoping to bridge this gap with my "Concise electronics for geeks", a reasonably short but anatomically correct primer on the physical phenomena, and practical considerations, in analog and digital circuits.

Feedback welcome!