This is a personal blog. My other stuff: book | home page | Twitter | prepping | CNC robotics | electronics

April 09, 2011

Using View > Encoding can kill you (in a manner of speaking)

Here's an interesting tidbit: you should never use the View > Encoding menu in any browser unless you fully trust the visited website.

Picking an alternative encoding through that menu overrides the character set not only for the top-level document, but also for all the nested frames - even if they happen to be cross-domain or hidden from view. And that may very well enable the owner of the visited page to carry out an XSS attack against a random third-party application without your knowledge.

Most security researchers associate encoding-related XSS problems with UTF-7, a somewhat preposterous and unnecessary encoding scheme that, by design, allows overlong encoding of 7-bit ASCII values (with disastrous consequences for HTML parsing). Not all browsers support UTF-7, and users are not likely to make that choice in the aforementioned menu. So, we're fine, right?

Well, not exactly. Many other, still popular multi-byte encodings, including Shift JIS or EUC-*, are also fairly problematic: their parsers often suffer from character consumption bugs, and in contrast to UTF-8, relatively little attention has been given to cleaning this up.

For example, with forced Shift JIS, this input is likely to be exploitable:

<img src="[0xE0]">
  ...this is still a part of the markup...
  " onerror="alert('Hi mom!')" x="
Simple demo here.


  1. You also need to trust all the user-generated content on the page.

    Do browsers ever auto-detect pages as being Shift JIS?

  2. Probably, but if you're not explicitly specifying charset=, you're already hosed.

    [ Interestingly, Opera permits cross-domain charset inheritance, something that others patched several years ago. The reason why they decided not to fix it probably has to do with not supporting UTF-7, but that's short-sighted :-) ]

  3. The demo doesn't seem to work for me neither in Chromium 5.0.375.99 nor in Konqueror 4.4.4 (KHTML 4.4.5).

  4. Yes, Shift JIS is cleaned up in WebKit.

  5. It's possible that this is an old idea, but it occurs to me that adding support for an extra layer of encoding ("context encoding") could be a moderately effective mitigation against XSS attacks.

    The idea would be to encode the byte stream with markers that say which parts are markup and which parts are data (i.e., untrusted). Obviously this could only be done if the HTTP request indicated that the client supported context encoding, and the HTTP response headers would have to indicate whether the content was indeed context encoded. (This might depend on the particular page.)

    As an example, the byte stream might look something like this:

    [HTTP response headers, including a header indicating that the content is context encoded]
    0x01 (indicating a markup block)
    0x00001536 (block length)
    [5430 bytes of markup]
    0x02 (indicating a data block)
    0x00000032 (length)
    [50 bytes of data]
    0x01 (indicating a markup block)
    0x00002C53 (length)
    [11317 bytes of markup]
    0x00 (indicating the end of the response)

    The client would then have to keep track of whether each decoded character was data or markup, and reject any markup whose characters were supposed to be data.

    Obviously there are lots of other ways you could do the encoding, but this should give you the basic idea. Thoughts?

  6. You can do that, or you can just separate document structure information from data altogether (i.e., send a binary representation of the DOM tree instead of serialized HTML). The problem is that nobody seems to be interested in using it.

  7. This is a better test. You can select a encoding from a list and it will shows which 8-bit values are vulnerable in your browser: