Regex to match any UTF character excluding punctuation

I am preparing a function in PHP to automatically convert a string to be used as the file name in the URL (* .html). Although ASCII should be used to be safe, for SEO needs I need to resolve the file name in any language, but I don't want it to include punctuation except dashes (-) and underscores (_), characters like *% $ # @"'not allowed.

Spaces must be converted to dashes.

I think using Regex will be the easiest way, but I'm not sure how to handle UTF8 strings.

My ASCII functions are as follows:

function convertToPath($string)
{
    $string = strtolower(trim($string));
    $string = preg_replace('/[^a-z0-9-]/', '-', $string);
    $string = preg_replace('/-+/', "-", $string);
    return $string;
}

Thanks,

Roy.

+3
source share
2

, SEO ASCII URL-.

URL- . ASCII.

, - , ASCII. , URL- -ASCII- URL- , () . ( script, stackoverflow script, , , )

: () " URL-"

, , ASCII:

<?php
  $text = iconv('UTF-8', 'US-ASCII//TRANSLIT', $text);
?>

,

+4

UTF-8, - ( - . PHP )

/\P{L}+/

():

function convertToPath($string)
{
    $string = mb_strtolower(trim($string), 'UTF-8');
    $string = preg_replace('/\P{L}+/', '-', $string);
    $string = preg_replace('/-+/', "-", $string);
    return $string;
}

, strtolower() UTF-8, - mb_strtolower() .

+4

Source: https://habr.com/ru/post/1706294/


All Articles