Sanitize attribute name

Based on the html syntax of the docs and the trial and error in the validator, I believe that the valid characters in the names of the HTML attributes are:

  • alphanumeric
  • transfers
  • underlining
  • Periods

For example, check these:

<p data-รฉxample> <p data-1.5> 

I want to write a function to sanitize attribute names:

 <?php function sanitize_attr_name ( $name ) { return is_string($name) ? preg_replace( '/[^\w\-\.]/', '', $name ) : ''; } 

This works, with the exception of special alpha characters:

 sanitize_attr_name( 'data-รฉxample' ); // 'data-xample' 

Now it might seem crazy for someone to use such characters, but it does work , although css doesn't seem to check whether it is escaped or not .

How do you do this in PHP? How can a sanitizer be written to allow special alpha characters? Is this possible with regex? And why is ctype_graph('รฉ') false?

+4
source share
1 answer

The PHP regex engine PCRE supports Unicode character properties with \p{property} . One of these properties is L , which is a property of any letter. Therefore, you can simply replace \w with \p{L}0-9_ :

 '/[^\p{L}0-9_.-]/' 

There is also no need to avoid periods in character classes, and hyphens can be put at the end to avoid escaping.

+4
source

Source: https://habr.com/ru/post/1444736/


All Articles