I found this great function in Chyrp code:
/** * Function: sanitize * Returns a sanitized string, typically for URLs. * * Parameters: * $string - The string to sanitize. * $force_lowercase - Force the string to lowercase? * $anal - If set to *true*, will remove all non-alphanumeric characters. */ function sanitize($string, $force_lowercase = true, $anal = false) { $strip = array("~", "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "=", "+", "[", "{", "]", "}", "\\", "|", ";", ":", "\"", "'", "‘", "’", "“", "”", "–", "—", "รขโฌ"", "รขโฌ"", ",", "<", ".", ">", "/", "?"); $clean = trim(str_replace($strip, "", strip_tags($string))); $clean = preg_replace('/\s+/', "-", $clean); $clean = ($anal) ? preg_replace("/[^a-zA-Z0-9]/", "", $clean) : $clean ; return ($force_lowercase) ? (function_exists('mb_strtolower')) ? mb_strtolower($clean, 'UTF-8') : strtolower($clean) : $clean; }
and this one in wordpress code
function sanitize_file_name( $filename ) { $filename_raw = $filename; $special_chars = array("?", "[", "]", "/", "\\", "=", "<", ">", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}"); $special_chars = apply_filters('sanitize_file_name_chars', $special_chars, $filename_raw); $filename = str_replace($special_chars, '', $filename); $filename = preg_replace('/[\s-]+/', '-', $filename); $filename = trim($filename, '.-_'); return apply_filters('sanitize_file_name', $filename, $filename_raw); }
September 2012 Update
Alix Axel has done some incredible work in this area. Its frame structure includes some great text filters and transformations.