When parsing HTML documents, browsers recognize two methods of specifying tag parameter values: a "bare" form (such as <img src=image.jpg>
), which is terminated by angle brackets, whitespaces, and so on; and a quoted form (<img src="image.jpg">
) which is terminated only by a matching quote.
Every browser makes the decision by looking at the first non-whitespace character after the name=value
separator. If this happens to be a single or a double quotation mark, the second parsing strategy is used; otherwise, the first method is a go. Internet Explorer also recognizes backticks (`) as a faux quote, leading to security flaws in a fair number of HTML filters - but even with this quirk, the behavior is still pretty straightforward. In particular, in the following example, stray quotes will not have any effect on how the tag is interpreted:
<a href=http://www.example.com/?">This text is not a tag parameter anymore.">Click me</a>
But here's the thing: Internet Explorer seems to be doing a substring search for an equals sign followed by a quote anywhere in the parameter name=value
pair. Therefore, the following syntax will be parsed in a very different way:
<a href=http://www.example.com/?=">This is still a part of markup indeed!">Click me</a>
It's one of the most unique and surreal HTML parser quirks I am aware of (and it survives to this day in Internet Explorer 9). In principle, it allows any server-side HTML filter to get out of sync with the browser, leading to parameter splitting and tag consumption. In reality, it has a limited practical significance: if your HTML filter is relaxed enough to allow this syntax to go through, it is probably already vulnerable to the abuse of other syntax tricks.
This is beyond geeky, and, as such, i would use the word 'fascinating' with caution.
ReplyDelete