Based on the html syntax of the docs and the trial and error in the validator, I believe that the valid characters in the names of the HTML attributes are:
- alphanumeric
- transfers
- underlining
- Periods
For example, check these:
<p data-รฉxample> <p data-1.5>
I want to write a function to sanitize attribute names:
<?php function sanitize_attr_name ( $name ) { return is_string($name) ? preg_replace( '/[^\w\-\.]/', '', $name ) : ''; }
This works, with the exception of special alpha characters:
sanitize_attr_name( 'data-รฉxample' ); // 'data-xample'
Now it might seem crazy for someone to use such characters, but it does work , although css doesn't seem to check whether it is escaped or not .
How do you do this in PHP? How can a sanitizer be written to allow special alpha characters? Is this possible with regex? And why is ctype_graph('รฉ') false?
source share