Are there other sequence browsers that are interpreted as special HTML characters?

Question

Are there other sequence browsers that are interpreted as special HTML characters?

There are several special < > & ' " characters in HTML that are relevant to the DOM parser. These are symbols of popular functions such as PHP htmlspecialchars convert to HTML objects so that they do not accidentally run something during analysis.

Completed translations:
'&' (ampersand) becomes &
" (double quote) becomes " when ENT_NOQUOTES is not set.
' (single quote) becomes ' only when ENT_QUOTES is set.
'<' (less) becomes <
'>' (more) becomes >

However, I remember that in older browsers, such as IE6, there were other byte sequences that made the browser DOM parser interpret the content as HTML .

Is this really a problem today? If you filter only these 5, is that enough to prevent XSS?

For example, all known combinations of the "<" symbol in HTML and JavaScript (in UTF-8) are presented here.

 < %3C &lt &lt; &LT &LT; &#60 &#060 &#0060 &#00060 &#000060 &#0000060 &#60; &#060; &#0060; &#00060; &#000060; &#0000060; &#x3c &#x03c &#x003c &#x0003c &#x00003c &#x000003c &#x3c; &#x03c; &#x003c; &#x0003c; &#x00003c; &#x000003c; &#X3c &#X03c &#X003c &#X0003c &#X00003c &#X000003c &#X3c; &#X03c; &#X003c; &#X0003c; &#X00003c; &#X000003c; &#x3C &#x03C &#x003C &#x0003C &#x00003C &#x000003C &#x3C; &#x03C; &#x003C; &#x0003C; &#x00003C; &#x000003C; &#X3C &#X03C &#X003C &#X0003C &#X00003C &#X000003C &#X3C; &#X03C; &#X003C; &#X0003C; &#X00003C; &#X000003C; \x3c \x3C \u003c \u003C

+6

html security php xss

Xeoncross Dec 24 '11 at 19:01

source share

3 answers

Here is an example: <button onclick="confirm('Are you sure you want to delete ');alert('xss')> Here, the input of intruders is what comes after" delete "and before" )>

This shielding will not work in this case because we avoided the wrong context.

In short, preventing xss means escaping for a given context. In the above example, we are in the javascript context in the context of the HTML attribute. See OWASP XSS Security Cheat.

+1

Erlend Dec 25 '11 at 19:07

source share

It is enough to avoid text in HTML, but in HTML there are contexts where even text is dangerous:

do not allow users to create arbitrary URLs (in <a> , <img> , etc.), since they can embed javascript: or many of its variants. Whitelist only ^https?:// .
HTML escaping is not enough in <script> (in any case, it uses object escaping) or in attributes that execute the script ( onclick , etc.). For those who need json_encode() .

+1

Kornel Dec 27 '11 at 14:27

source share

Ktash · Accepted Answer · 2011-12-24T19:15:39+0000

Not. I really learned this when I studied using CSS and attributes to automatically style styles based on content ( my question ), and the short answer is no. Modern browsers do not allow the use of "byte sequences" as HTML. I often use "byte sequences" because the most risky code does not use byte encoded values.

The examples listed on the XSS website relate to the use of attributes, and javascript is interpreted as a string to be executed. But things are also listed, like &{alert('XSS')} , which runs the code in parentheses, and this code does not work in modern browsers.

But to answer your second question, no, filtering these 5 is not enough to prevent an XSS attack. Throw your code through PHP HTML special character code always, but there are hundreds of bytecodes that you can use , and you really can't guarantee anything. Sending it through a PHP filter (especially htmlentities() ) will give you the exact text entered when it was output in HTML (IE « instead of & laquo;). However, in most cases, depending on your use, using htmlspecialchars enough to cover most attacks. It depends on how you use the input, but for the most part it will be safe.

XSS is a daunting task. The general correct rule always filters everything that the user enters. And use whitelisting, not blacklisting. What you are talking about will be black, listing these values when it is always safer to assume that your users are malicious and allow certain things.

Are there other sequence browsers that are interpreted as special HTML characters?

More articles: