What other characters next to the ampersand (&) should be encoded in the href / src HTML attributes?

Question

What other characters next to the ampersand (&) should be encoded in the href / src HTML attributes?

Is ampersand the only character to be encoded in the HTML attribute?

It is well known that this will not pass the test:

<a href="http://domain.com/search?q=whatever&lang=en"></a>

Because the ampersand must be & . Here's a direct link for verification failure.

This guy lists a bunch of characters that need to be encoded, but he is mistaken. If you encode the first "/" in http:// , href will not work.

In ASP.NET, is there a helper method already built for this? Things like Server.UrlEncode and HtmlEncode obviously do not work - they are designed for different purposes.

I can create my own simple extension method (e.g. .ToAttributeView() ) that replaces a simple string.

+6

html href url-encoding

sohtimsso1970 Sep 17 '11 at 16:48

source share

5 answers

The purpose of escaping characters is that they will not be treated as arguments. That way, you really don't want to encode the entire URL, just the values you pass through the request. For instance:

 http://example.com/?parameter1=<ENCODED VALUE>&parameter2=<ENCODED VALUE>

The URL you provide is a valid URL that will pass validation. However, the browser interprets the & characters as the gap between the parameters in querystring. So your request:

 ?q=whatever&lang=en

It will actually be translated by the receiver as two parameters:

 q = "whatever" lang = "en"

For your URL to work, you just need to make sure your values are encoded:

 ?q=<ENCODED VALUE>&lang=<ENCODED VALUE>

Change The general problems page from the connected W3C talks about extreme cases when URLs are displayed in html and & is text that can be interpreted as an entity reference (for example, © ). Here is a test in jsfiddle showing the url:

http://jsfiddle.net/YjPHA/1/

In Chrome and FireFox, links work correctly, but IE displays © as & copy;, breaking the link. I have to admit that I never had a problem with this in the wild (this would only affect entity references that don't need a semicolon, which is a pretty small subset).

To keep you safe from this error, you can encode the HTML code of any of your URLs that you are viewing on the page, and everything should be in order. If you are using ASP.NET, a method HttpUtility.HtmlEncode should work fine.

+1

Chris van opstal Sep 17 '11 at 16:59

source share

You do not need an HTML descent here:

 <a href="http://domain.com/search?q=whatever&lang=en"></a>

According to HTML5 specification: http://www.w3.org/TR/html5/tokenization.html#character-reference-in-attribute-value-state

&lang= should be parsed as an unrecognized symbol reference, and the attribute value should be used like this: http://domain.com/search?q=whatever&lang=en

For reference: added question in WG5 HTML5: http://lists.w3.org/Archives/Public/public-html/2011Sep/0163.html

+1

c-smile Sep 17 '11 at 17:42

source share

In the values of the HTML attributes, if you want ',' & 'and, as a result, an inextricable space, you must (as the author who clearly states the intention) have ", amp; and & nbsp; in the markup.

For "however, you do not need to use" if you use single quotes to enclose your attribute values.

For HTML text nodes, in addition to the above, if you want <and> as a result, you should use & lt; and ?. (I would even use them in attribute values too.)

For hfnames and hfvalues (and directory names in the path) for the URI, I used Javascript encodeURIComponent () (on the utf-8 page when encoding for use on the utf-8 page).

+1

Shadow2531 20 sept '11 at 10:40

source share

If I understand the question correctly, I believe this is what you want.

0

Tyler crompton Sep 17 '11 at 17:23

source share

mVChr · Accepted Answer · 2011-09-17T17:39:16+0000

Besides the standard encoding of URIs of values, it is the only character associated with HTML objects that you need to worry about, simply because it is a character that begins with each HTML object. Take for example the following URL:

 http://query.com/?q=foo&lt=bar&gt=baz

Although there are no finite half-columns, since & lt; is an object for <and? is an entity for> some older browsers will translate this url to:

 http://query.com/?q=foo<=bar>=baz

So, you need to specify and how & so that this does not happen for links in the parsed HTML document.

What other characters next to the ampersand (&) should be encoded in the href / src HTML attributes?

More articles: