JavaScript X / HTML and CSS

Question

JavaScript X / HTML and CSS

Before everyone tells me that I should not do disinfection on the client side (I really intend to do this on the client, although it can work in SSJS as well) let me clarify what I'm trying to do.

I would like something similar to Google Caja or HTMLPurifier , but for JavaScript: a whitelist-based security approach that handles HTML and CSS (of course, not pasted into the DOM, which would be unsafe, but first received as a string) , and then selectively filters out unsafe tags or attributes, ignoring them or optionally including them as escaped text or otherwise allowing them to be communicated to the application for further processing, ideally in context. It would be great if it could reduce any JavaScript to a safe subset, as in Google Caja, but I know that it will be a lot.

My use case refers to unreliable XML / XHTML data obtained through JSONP (data from a Mediawiki widget before processing a wiki, thereby allowing the use of raw but unreliable XML / HTML input) and allows the user to make queries and transformations on this data (XQuery, jQuery, XSLT, etc.), using HTML5 for offline use, IndexedDB repositories, etc., and which then allow the results to be viewed on the same page where the user viewed the source and built or imported their queries.

The user can produce any output that they need, so I will not sanitize what they do - if they want to add JavaScript to the page, they will all be useful to them. But I want to protect users who want to be sure that they can add code that safely copies over target elements from untrusted input, without allowing them to copy insecure input.

This is definitely doable, but I'm wondering if there are any libraries that already do this.

And if I am stuck in implementing this on my own (although I'm curious anyway), I would like to get evidence that using innerHTML or creating / adding a DOM before inserting into a document is safe every time. For example, can events be accidentally triggered if I first ran DOMParser or relied on parsing an HTML browser using innerHTML to add raw HTML to an uninserted div? I believe it should be safe, but not sure if DOM manipulation events can happen in any way before the insert that can be used.

Of course, after that it would be necessary to clean up the built DOM, but I just want to check if I can safely build the DOM object itself to facilitate traversal, and then worry about filtering out unwanted elements, attributes and attribute values.

Thanks!

+4

javascript sanitization

Bret zamir Apr 7 '11 at 3:20

source share

1 answer

daniellmb · Accepted Answer · 2011-04-14T15:58:06+0000

The goal of ESAPI is to provide a simple interface that provides all the security features that a developer is likely to need in a clear, consistent, and easy to use way. The ESAPI architecture is very simple, just a set of classes that encapsulate the key security operations that most applications require.

JavaScript version of OWASP ESAPI: http://code.google.com/p/owasp-esapi-js

Validating input is extremely difficult to execute efficiently, HTML is the worst code and data compiler of all time, since there are so many possible places to put code and so many different valid encodings. HTML is especially difficult because it is not only hierarchical, but also contains many different parsers (XML, HTML, JavaScript, VBScript, CSS, URL, etc.). Although input validation is important and should always be done, it is not a complete solution for injection attacks. It is better to use shielding as the main protection. I didn’t use an HTML cleaner before, but it looks good, and of course they spent a lot of time and thought about it. Why not use their server side of the solution first, and then apply any additional rules that you would like after that. I saw some hacks that use nothing but [ ] ( ) combinations to write code. 100s more examples here XSS Script (Script) Screenshot and Open Web Application Security Project (OWASP) . Some things to keep an eye on the DOM based on the XSS security cheat sheet .

HTML cleaner catches this mixed encoding hack

 <A HREF="h tt p://6&#9;6.000146.0x7.147/">XSS</A>

And this is a div background image using unicoded XSS exploit

 <DIV STYLE="background-image:\0075\0072\006C\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028.1027\0058.1053\0053\0027\0029'\0029">

A little bit about what you came across: all 70 possible combinations of the "<" symbol in HTML and JavaScript

 < %3C &lt &lt; &LT &LT; &#60 &#060 &#0060 &#00060 &#000060 &#0000060 &#60; &#060; &#0060; &#00060; &#000060; &#0000060; &#x3c &#x03c &#x003c &#x0003c &#x00003c &#x000003c &#x3c; &#x03c; &#x003c; &#x0003c; &#x00003c; &#x000003c; &#X3c &#X03c &#X003c &#X0003c &#X00003c &#X000003c &#X3c; &#X03c; &#X003c; &#X0003c; &#X00003c; &#X000003c; &#x3C &#x03C &#x003C &#x0003C &#x00003C &#x000003C &#x3C; &#x03C; &#x003C; &#x0003C; &#x00003C; &#x000003C; &#X3C &#X03C &#X003C &#X0003C &#X00003C &#X000003C &#X3C; &#X03C; &#X003C; &#X0003C; &#X00003C; &#X000003C; \x3c \x3C \u003c \u003C

JavaScript X / HTML and CSS

More articles: