I'm trying to stop the XSS attack, so I use the html agility pack to make my whitelist and script library for Microsoft Anti-Cross Site to handle the rest.
Now I am looking at the encoding of all html hrefs. I get a large line of html code that may contain hrefs. They have a URL in Accours for MS Library, but if you encode the whole URl, then it cannot be used. Therefore, in this example, they simply encode the query string
UrlEncode Invalid input used in URL (for example, value in querystring) Click Here!
http://msdn.microsoft.com/en-us/library/aa973813.aspx
So now my questions are: how can I parse href and find the query string. Is it always this? then a query string, or can it have spaces and be written differently?
Edit
These URLs will not be written by me, but by users who share them. So I need a way to make sure that I get all the query strings, not just those that are in a valid format. If it can work with an invalid format, I also have to capture them. Hackers do not care if it is a valid format or not if it is still doing what it wants.
source
share