Match character set and optional object

So, I wanted to insert a word break character in every 5 characters of a string using this code.

([^\s-]{5})([^\s-]{5})

Unfortunately, it also breaks down into entity characters ( &#xxx;). Can someone provide me an example that does not violate the entity code? The string I want to split is xml, so the actual entity is deleted even more ( &#xxx;).

Change sample code

preg_replace('/([^\s-]{5})([^\s-]{5})/', '$1­$2', $subject)

Given the word "Fårevejle"
Expect "Få­revejle" as result
But it outputs "F­5;revejle" instead
+3
source share
1 answer

Assuming you want to break each word after five characters, if they are no longer separated by a hyphen, treating the object as one character, try the following:

$result = preg_replace(
    '/            # Start the match 
    (?:           # at one of the following positions:
     (?<=         # Either right after...
      [\s-]       # a space or dash
     )            # end of lookbehind
     |            # or...
     \G           # wherever the last match ended.
    )             # End of start condition.
    (             # Now match and capture the following:
     (?>          # Match the following in an atomic group:
      &amp;\#\w+; # an entity
      |           # or
      [^\s-]      # a non-space, non-dash character
     ){5}         # exactly 5 times.
    )             # End of capture
    (?=[^\s-])    # Assert that we\'re not at the end of a "word"/x', 
    '\1&shy;', $subject);

It changes

supercalifragilisticexpidon'tremember! 
alrea-dy se-parated 
count entity as one character&amp;#345;blahblah
F&amp;#xe5;revejle

at

super&shy;calif&shy;ragil&shy;istic&shy;expid&shy;on'tr&shy;ememb&shy;er! 
alrea-dy se-parat&shy;ed 
count entit&shy;y as one chara&shy;cter&amp;#345;&shy;blahb&shy;lah
F&amp;#xe5;rev&shy;ejle
+4
source

Source: https://habr.com/ru/post/1796042/


All Articles