This is a personal blog. My other stuff: book | home page | Twitter | prepping | CNC robotics | electronics

August 16, 2010

On designing UIs for non-robots

In a typical, attentive human subject, the usual latency between a visual stimulus and a voluntary motor response is between 100 and 300 milliseconds. As should be evident, we do not pause for that long to assess the situation after each and every muscle movement; instead, we routinely schedule series of motor actions well in advance - and process sensory feedback only after the fact, in an asynchronous manner. Within that sub-second window of opportunity, we are simply unable to abort a premeditated action - even if things go wrong.

And here lies an interesting problem: on today's blazing fast personal computers, a lot can happen in as little as one-tenth of that timeframe. Within a browser, windows can be opened, moved around, and then closed; system prompts triggered or destroyed; programs launched and terminated. In such an environment, designing security UIs that take human cognitive limitations into account is a tricky game: any security-relevant prompt that does not enforce a certain amount of uninterrupted, distraction-free, in-focus screen time before accepting user input, is likely completely broken.

Intuitively, this just feels wrong - surely, humans can't be that bad, so the issue can't be that serious - but this is exactly the sort of a fallacy we should be trying to avoid. There is nothing, absolutely nothing, that would make attacks impractical; increasingly faster JavaScript has the ability to programatically open, position, resize, focus, blur, and close windows, and measure mouse pointer velocity and click timings with extreme accuracy; with a bit of basic ingenuity, any opportunity for a voluntary user reaction can be taken out of the equation. That's it: we suck, and there is nothing you can do to change it.

To back this claim, let's have a look at the recently-introduced HTML5 geolocation API; the initial call to navigator.geolocation.getCurrentPosition() spawns a security prompt in Firefox, Opera, Chrome, Safari, and a couple of other browsers. This UI does not implement a meaningful delay before accepting user input - and so, this crude and harmless Firefox proof-of-concept can be used to predict the timing of mouse clicks, and steal your location data with an annoyingly high success rate. This particular vector is tracked as Mozilla bug 583175, but similar problems are endemic to most of the new security UIs in place; the reason is not always simple oversight, but often, just explicit opposition to the idea of introducing usability roadblocks: after all, to a perfect human being, they are just a nuisance.

Fine-grained click timing is, of course, not where the story ends; it has been demonstrated time and time again that with minimal and seemingly innocuous conditioning, healthy and focused test subjects can be reliably duped into completely ignoring very prominent and unusual visual signals; the significance of this problem is unappreciated mostly because not many exploit writers are behavioral scientists - but that's not very reassuring thought.

There is some admirable work going on to make browser security messaging more accessible to non-technical users; but I'd wager that our problems run deeper than that. We are notoriously prone to overestimating the clarity of our perception, the rationality of our thought, and the accuracy of our actions; this is often a desirable trait when going through your life - but it tends to bite us hard when trying to design security-critical software to be used by other human beings.

We need to fight the habit the best we can, and start working on unified, secure human-to-machine interfacing in the browser. If we dismiss our inherent traits as an out-of-scope problem in security engineering, we will lose.

PS. On a somewhat related note, you may also enjoy Jesse Ruderman's recent 10-minute presentation about UI truncation attacks.


  1. With regards to click timing attacks, wouldn't it be relatively straightforward to put security prompts - or at least the part that you have to use to approve the action - outside of the web site's part of the browser window? For example, approval could be via a menu item. Menus are a well-established and familiar method of doing things, so should be acceptable from a usability standpoint.

    I'm presuming, of course, that the web browser takes the elementary precaution of preventing sites from displaying anything outside of its part of the screen. :-)

  2. Part of the problem is that JavaScript is free to reposition and resize windows at will; so a safe area of one window may overlap with an unsafe area of another.

    Reserving some completely isolated screen space for this would work, but then, full-screen users would likely complain.

    Plus, there are still some concerns with this; for example, what if I trigger one prompt, to which the user would normally consent, and then quickly replace it with a malicious one right before the click? Tricky.

    1. "Part of the problem is that JavaScript is free to reposition and resize windows at will"

      which is a problem in itself!

      A window should never move unless I make it move.

  3. Please note: I am not a UI expert nor a browser security guy.

    Perhaps the direction towards the "outside" makes sense, but take it to a more extreme level: externalise the security messaging (and feedback) system from the browser UI itself. By decoupling the components, it might be easier to manage the information, and presentation could be easier to control ... e.g. the visualisation could be presented over a "locked" browser UI (per-tab? track child tabs?), to prevent further interference. Perhaps something similar to the UI "administrator approval" messages that are common on current incarnations of MS Windows, Ubuntu, etc.

  4. The way I see it, each web site is entitled to a certain amount of real estate on the screen, as assigned to it by the browser based on the user's instructions. Javascript should be allowed to create, reposition and remove sub-windows within this area, but not outside it; that's just asking for trouble.

    As for full-screen users, I figure they could be required to switch back to windowed mode in order to approve any action that represents a security risk. This should be an infrequent event, so no great hassle.

    If approval is via a menu, it would be easy to ensure that items could not be added or removed while the menu is open. (This could be transparent to the web site, the GUI would just take a snapshot of the current list of items awaiting approval when the menu was opened.)

    However, I do see a potential problem with web sites filling up the approval menu with multiple items in the hope of confusing the user into approving the wrong one.

  5. Opera experimented with the MDI model where the estate available to each and every window is better controlled; but I suspect there are reasons why customers are not that keen to embrace it.

    In some ways, the problem traces back simply to the fact that most of the research on UI design essentially counters any security efforts. User studies reveal that people loathe every single added click, every extra 500 ms of waiting, or even having to move the mouse too far; and don't understand security messaging no matter how seemingly accessible you try to make it, because they don't understand the fundamental concepts behind the web (URLs, origins, HTTP, etc).

    So, unless you come to this fight equipped with compelling real-world data showing that your solution works and is accepted by the masses, you will have a difficult time selling it to the people in charge of UI design for a mainstream browser :-(