How can I safely and accurately embed user-provided URL data into an HTML5 document?

Given the user's arbitrary input to the web form for the url, I want to generate a new HTML document containing that url in href. My question is how should I protect this URL in my HTML code.

What you need to do in HTML for the following URLs that are entered by an unknown end user:

  • http://example.com/?file=some_19%affordable.txt
  • http://example.com/url?source=web&last="f o o"&bar=<
  • https://www.google.com/url?source=web&sqi=2&url=https%3A%2F%2Ftwitter.com%2F%3Flang%3Den&last=%22foo%22

If we assume that the URLs are already encoded in uri, which, in my opinion, is reasonable if they copy it from the URL string, and just passing it to attr()gives a valid URL and a document that passes the Nu check HTML at validator.w3.org/nu.

To see this in action, we installed the JS script at https://jsfiddle.net/kamelkev/w8ygpcsz/2/ , where replacing the URLs with the above examples can show what is happening.

For future reference, this consists of an HTML snippet

<a>My Link</a>

and this is JS:

$(document).ready(function() {
 $('a').attr('href', 'http://example.com/request.html?data=&gt;');
 $('a').attr('href2', 'http://example.com/request.html?data=<');
 alert($('a').get(0).outerHTML);
});

So, with URL 1, it is impossible to determine if it is URI encoded or not by looking at it mechanically. You can assume, based on your human knowledge, that this is not so, and refers to a file with a name some_19%affordable.txt. When you start the violin, it produces

<a href="http://example.com/?file=some_19%affordable.txt">My Link</a>

What the HTML5 validator skips without problems. This is probably not what the user wanted.

The second URL is not explicitly URI encoded. The question is what is the right place in HTML to prevent problems with HTML parsing.

By running it through the violin, Safari 10 produces this:

<a href="http://example.com/url?source=web&amp;last=&quot;f o o&quot;&amp;bar=&lt;">My Link</a>

:

<a href="http://example.com/url?source=web&amp;last=&quot;f o o&quot;&amp;bar=<">My Link</a>

. : ( HTML), < ( HTML). , . HTML.

: a) html-escape URL, attr(). , & &amp;, , &amp; &lt; attr(), URL- . :

<a href="http://example.com/url?source=web&amp;amp;last=&amp;quot;f+o+o&amp;quot;&amp;amp;bar=&amp;lt;">My Link</a>

- URI attr(), URL- , . :

<a href="http://example.com/url?source=web&amp;last=%22f%20o%20o%22&amp;bar=%3C">My Link</a>

, URL-, URI, HTML-, .

<a href="https://www.google.com/url?source=web&amp;sqi=2&amp;url=https%3A%2F%2Ftwitter.com%2F%3Flang%3Den&amp;last=%22foo%22">My Link</a>

, .

, :

if url is encoded then
 pass as-is to attr()
else
 pass encodeURI(url) to attr()

, "", -, (, . URL 1):

, URL? , / URL?

attr() HTML- URL 2 , :

<a href="http://example.com/url?source=web&amp;last=&quot;f+o+o&quot;&amp;bar=&lt;">My Link</a>

, HTML, HTML5, URL-. , , . , - , &.

, . , HTML-, , . , , HTML-escape.

URL- HTML5 ( JavaScript)?

+4
1

, URL- , , - . URL-, , URL- , URL-.

<script>
var inputurl = 'http://example.com/?file=some_19%affordable.txt';
var myurl;

try {
    myurl = decodeURI(inputurl);
}
catch(error) {
    myurl = inputurl;
}

console.log(myurl);
</script>
0

Source: https://habr.com/ru/post/1656390/


All Articles