XSS and .innerHTML Prevention

When I allow users to insert data as an argument to the JS innerHTML function as follows:

 element.innerHTML = "User provided variable"; 

I realized that to prevent XSS, I need to encode HTML, and then JS encode user input, because the user can insert something like this:

 <img src=a onerror='alert();'> 

Only HTML or just JS coding would not help, because the .innerHTML method, as I understand it, decodes the input before embedding it on the page. With HTML + JS encoding, I noticed that .innerHTML only decodes JS, but the HTML encoding remains.

But I was able to achieve the same by double coding in HTML.

My question is: can someone give an example of why I should encode HTML and then encode JS and not encode HTML encoded using the .innerHTML method?

+6
source share
3 answers

Can someone provide an example of why I should encode HTML and then JS encodes, rather than double-encode in HTML when using the .innerHTML method?

Sure.

Assuming that the data provided by the user is populated in your JavaScript server, you will need JS coding to get it.

This is the following server-side pseudo-code, but in front-panel JavaScript:

 var userProdividedData = "<%=serverVariableSetByUser %>"; element.innerHTML = userProdividedData; 

Like ASP.NET <%= %> displays a server-side variable without encoding. If the user is “good” and supplies the value foo , then this will display the following JavaScript:

 var userProdividedData = "foo"; element.innerHTML = userProdividedData; 

Still no problem.

Now say that the malicious user supplies the value "; alert("xss attack!");// . This will display as:

 var userProdividedData = ""; alert("xss attack!");//"; element.innerHTML = userProdividedData; 

which will lead to an XSS exploit, where the code actually executes in the first line above.

To prevent this, as you say, you encode JS. Rule number 3 to prevent circumvention of OWASP XSS says:

With the exception of alphanumeric characters, avoid all characters less than 256 with the \ xHH format to prevent the data value from being turned off in a script context or other attribute.

So, to provide protection against this, your code will

 var userProdividedData = "<%=JsEncode(serverVariableSetByUser) %>"; element.innerHTML = userProdividedData; 

where JsEncode is encoded as recommended by OWASP.

This will prevent the aforementioned attack, as it will now look like this:

 var userProdividedData = "\x22\x3b\x20alert\x28\x22xss\x20attack\x21\x22\x29\x3b\x2f\x2f"; element.innerHTML = userProdividedData; 

You have now provided the assignment of a JavaScript variable for XSS.

However, what if a malicious user set <img src="xx" onerror="alert('xss attack')" /> as the value? This would be good for the variable assignment part, as it simply translates to the equivalent of a hexadecimal entity, as described above.

However string

 element.innerHTML = userProdividedData; 

will alert('xss attack') when the browser displays internal HTML. This will be a DOM Based XSS attack.

That is why you need to code HTML too. This can be done using a function such as:

 function escapeHTML (unsafe_str) { return unsafe_str .replace(/&/g, '&amp;') .replace(/</g, '&lt;') .replace(/>/g, '&gt;') .replace(/\"/g, '&quot;') .replace(/\'/g, '&#39;') .replace(/\//g, '&#x2F;') } 

creating your code

 element.innerHTML = escapeHTML(userProdividedData); 

or can be done using the jQuery text() function.

Update regarding question in comments

I have one more question: you mentioned that we must encode JS because an attacker can enter "; alert("xss attack!");// . But if we use HTML encoding instead of JS encoding, isn’t it encodes the sign " and makes this attack impossible, because we will have: var userProdividedData ="&quot;; alert(&quot;xss attack!&quot;);&#x2F;&#x2F;";

I ask your question as follows: instead of JS encoding followed by HTML encoding, why don't we just code the HTML first and leave it to that?

Good, because they can encode an attack, such as <img src="xx" onerror="alert('xss attack')" /> , all encoded using the \xHH format to insert their payload - this provided would be the desired sequence of HTML attacks without using any characters affected by the HTML encoding.

There are other attacks as well: if an attacker entered \ , they can cause the browser to skip the closing quote (since \ is an escape character in JavaScript).

It will look like:

 var userProdividedData = "\"; 

which would cause a JavaScript error because it is not a correctly completed statement. This can lead to denial of service of the application if it is displayed in a visible place.

Also, let's say that there were two user-driven data elements:

 var userProdividedData = "<%=serverVariableSetByUser1 %>" + ' - ' + "<%=serverVariableSetByUser2 %>"; 

the user can then enter \ in the first and ;alert('xss');// in the second. This will change the string concatenation to one big job, followed by an XSS attack:

 var userProdividedData = "\" + ' - ' + ";alert('xss');//"; 

Because of extreme cases like these, it is recommended that you follow the OWASP guidelines as they are as close to bulletproof as possible. You might think adding \ to the list of encoded HTML values ​​solves this, however there are other reasons to use JS followed by HTML when rendering content in this way, because this method also works for data in attribute values:

 <a href="javascript:void(0)" onclick="myFunction('<%=JsEncode(serverVariableSetByUser) %>'); return false"> 

Although this is a single or double quote:

 <a href='javascript:void(0)' onclick='myFunction("<%=JsEncode(serverVariableSetByUser) %>"); return false'> 

Or even without quotes:

 <a href=javascript:void(0) onclick=myFunction("<%=JsEncode(serverVariableSetByUser) %>");return false;> 

If the HTML code provided in your comment has an entity value:

onclick='var userProdividedData ="&quot;;"' (short version)

the code is run first using the HTML parser of the browser, so userProdividedData will

 ";; 

instead

 &quot;; 

so when you add it to the innerHTML call, you will again have XSS. Please note that <script> blocks are not processed using the HTML browser parser, except for the closing </script> , but which is a different story .

It is always wise to code as late as possible, as shown above. Then, if you need to display the value with nothing but the JavaScript context (for example, the actual warning field does not display HTML, it will display correctly).

That is, with the above, I can call

 alert(serverVariableSetByUser); 

as easy as installing HTML

 element.innerHTML = escapeHTML(userProdividedData); 

In both cases, it will be displayed correctly if certain characters are not disabled or cause unwanted code execution.

+10
source

An easy way to ensure that the content of your element correctly encoded (and will not be parsed as HTML) is to use textContent instead of innerHTML :

 element.textContent = "User provided variable with <img src=a>"; 

Another option is to use innerHTML only after you have encoded (preferably on the server, if you have a chance) the values ​​that you are going to use.

+4
source

I ran into this problem in my ASP.NET Webforms application. The correction to this is relatively simple.

Install HtmlSanitizationLibrary from NuGet Package Manager and specify this in your application. In the code behind, please use the disinfectant class as follows.

For example, if the current code looks something like this,

 YourHtmlElement.InnerHtml = "Your HTML content" ; 

Then replace it as follows:

 string unsafeHtml = "Your HTML content"; YourHtmlElement.InnerHtml = Sanitizer.GetSafeHtml(unsafeHtml); 

This fix resolves the Veracode vulnerability and ensures that the string is displayed as HTML. Encoding a line in the code behind will display it as an “unencrypted line”, not HTML RAW, because it is encoded before the start of rendering.

+2
source

Source: https://habr.com/ru/post/988556/


All Articles