Problem with Unicode-based XSS?

Question

Problem with Unicode-based XSS?

Maybe this is better for security.stack, I'm not sure, but here is the question:

I recently came across a blog claiming that ＜ｓｃｒｉｐｔ＞ａｌｅｒｔ（１）＜／ｓｃｒｉｐｔ＞ will be processed in the actual <script> . However, in my tests on recent Chrome this is not the case. Has anyone heard that the browser parses it as real markup? If so, then I have no idea how to mitigate it, as there are supposedly others, not just '＜' to worry about, and I know that I'm not going to iterate over all Unicode to list them.

+4

security unicode xss cjk

wwaawaw 24 sept '12 at 4:30

source share

2 answers

No, the browser will not interpret the text surrounded by LT or GT full width characters as valid HTML tags, but certain backends will convert them to regular LT or GT characters, creating an XSS risk. See the following: http://websec.imtqy.com/unicode-security-guide/character-transformations/#best-fit

+4

Brian h Dec 19 '14 at 20:27

source share

Jukka K. Korpela · Accepted Answer · 2012-09-24T05:18:32+0000

This would be a direct violation of the HTML specifications. To them, significant markup characters are Ascii characters, while characters such as U + FF1C FULLWIDTH LESS-THAN SIGN "<" are just data characters that have no special meaning. Browsers need additional code to match the bandwidth characters for Ascii (either as a special mapping, or, for example, by normalizing to NFKD or NFCKC), but there is no reason to believe that they will do such things more than there is reason to believe so that they can start displaying "[" in "<".

Thus, a blog that claims it is different simply describes the possibility that someone has invented, but has no real reason. You can usually see this from the links and demos provided. (That is, due to their absence.)

Of course, the security problems around Unicode characters look alike, but then it is a question of people mistakenly accepting one character for another, even if they are internally completely different, for example, "<" for "<". (and therefore, for example, see a line in the HTML source as a script element, even if it is not) or "a" for "a" (Cyrillic letter for a Latin letter with the same appearance). That is, people can see characters the same, even if programs see them differently.

Problem with Unicode-based XSS?

More articles: