Can someone provide an example of why I should encode HTML and then JS encodes, rather than double-encode in HTML when using the .innerHTML method?
Sure.
Assuming that the data provided by the user is populated in your JavaScript server, you will need JS coding to get it.
This is the following server-side pseudo-code, but in front-panel JavaScript:
var userProdividedData = "<%=serverVariableSetByUser %>"; element.innerHTML = userProdividedData;
Like ASP.NET <%= %> displays a server-side variable without encoding. If the user is “good” and supplies the value foo , then this will display the following JavaScript:
var userProdividedData = "foo"; element.innerHTML = userProdividedData;
Still no problem.
Now say that the malicious user supplies the value "; alert("xss attack!");// . This will display as:
var userProdividedData = ""; alert("xss attack!");
which will lead to an XSS exploit, where the code actually executes in the first line above.
To prevent this, as you say, you encode JS. Rule number 3 to prevent circumvention of OWASP XSS says:
With the exception of alphanumeric characters, avoid all characters less than 256 with the \ xHH format to prevent the data value from being turned off in a script context or other attribute.
So, to provide protection against this, your code will
var userProdividedData = "<%=JsEncode(serverVariableSetByUser) %>"; element.innerHTML = userProdividedData;
where JsEncode is encoded as recommended by OWASP.
This will prevent the aforementioned attack, as it will now look like this:
var userProdividedData = "\x22\x3b\x20alert\x28\x22xss\x20attack\x21\x22\x29\x3b\x2f\x2f"; element.innerHTML = userProdividedData;
You have now provided the assignment of a JavaScript variable for XSS.
However, what if a malicious user set <img src="xx" onerror="alert('xss attack')" /> as the value? This would be good for the variable assignment part, as it simply translates to the equivalent of a hexadecimal entity, as described above.
However string
element.innerHTML = userProdividedData;
will alert('xss attack') when the browser displays internal HTML. This will be a DOM Based XSS attack.
That is why you need to code HTML too. This can be done using a function such as:
function escapeHTML (unsafe_str) { return unsafe_str .replace(/&/g, '&') .replace(/</g, '<') .replace(/>/g, '>') .replace(/\"/g, '"') .replace(/\'/g, ''') .replace(/\//g, '/') }
creating your code
element.innerHTML = escapeHTML(userProdividedData);
or can be done using the jQuery text() function.
Update regarding question in comments
I have one more question: you mentioned that we must encode JS because an attacker can enter "; alert("xss attack!");// . But if we use HTML encoding instead of JS encoding, isn’t it encodes the sign " and makes this attack impossible, because we will have: var userProdividedData =""; alert("xss attack!");//";
I ask your question as follows: instead of JS encoding followed by HTML encoding, why don't we just code the HTML first and leave it to that?
Good, because they can encode an attack, such as <img src="xx" onerror="alert('xss attack')" /> , all encoded using the \xHH format to insert their payload - this provided would be the desired sequence of HTML attacks without using any characters affected by the HTML encoding.
There are other attacks as well: if an attacker entered \ , they can cause the browser to skip the closing quote (since \ is an escape character in JavaScript).
It will look like:
var userProdividedData = "\";
which would cause a JavaScript error because it is not a correctly completed statement. This can lead to denial of service of the application if it is displayed in a visible place.
Also, let's say that there were two user-driven data elements:
var userProdividedData = "<%=serverVariableSetByUser1 %>" + ' - ' + "<%=serverVariableSetByUser2 %>";
the user can then enter \ in the first and ;alert('xss');// in the second. This will change the string concatenation to one big job, followed by an XSS attack:
var userProdividedData = "\" + ' - ' + ";alert('xss');
Because of extreme cases like these, it is recommended that you follow the OWASP guidelines as they are as close to bulletproof as possible. You might think adding \ to the list of encoded HTML values ​​solves this, however there are other reasons to use JS followed by HTML when rendering content in this way, because this method also works for data in attribute values:
<a href="javascript:void(0)" onclick="myFunction('<%=JsEncode(serverVariableSetByUser) %>'); return false">
Although this is a single or double quote:
<a href='javascript:void(0)' onclick='myFunction("<%=JsEncode(serverVariableSetByUser) %>"); return false'>
Or even without quotes:
<a href=javascript:void(0) onclick=myFunction("<%=JsEncode(serverVariableSetByUser) %>");return false;>
If the HTML code provided in your comment has an entity value:
onclick='var userProdividedData ="";"' (short version)
the code is run first using the HTML parser of the browser, so userProdividedData will
";;
instead
";
so when you add it to the innerHTML call, you will again have XSS. Please note that <script> blocks are not processed using the HTML browser parser, except for the closing </script> , but which is a different story .
It is always wise to code as late as possible, as shown above. Then, if you need to display the value with nothing but the JavaScript context (for example, the actual warning field does not display HTML, it will display correctly).
That is, with the above, I can call
alert(serverVariableSetByUser);
as easy as installing HTML
element.innerHTML = escapeHTML(userProdividedData);
In both cases, it will be displayed correctly if certain characters are not disabled or cause unwanted code execution.