Preg_match to capture a line after a special character

I have text files with lines, and for each line I need to split and capture every part of it.

The line is similar:

Joao.Martins.G2R71.Pedro.Feliz.sno 

To be: NAME 1st player (only first or first + last name) G = game (may be 2 or 02 or another number less than 99); R = result (wis 7x1 home team in this example) and NAME 2nd player ... last 3 characters - type of game (this snooker example)

But the line could also be:

 Joao Martins |2x71| Pedro Feliz.poo 

I am not a Regex expert (unfortunately) and have already looked for a lot of questions here, not finding a solution or, for that matter, even getting help, just reading the answers to other questions (mainly because I never understand this)

I already have this:

 preg_match("/\[(|^|]+)\]/",$string,$result); echo $result[1] . "<br />"; 

But it only gives me all things between | | part without even separating them and ignoring everything else

Can you guys help me with a solution for both cases? I, as usual, completely lost here!

Thanks in advance!

+4
source share
3 answers

explode method:

You do not need to use complex regexp, you can use simple explode .

 $parts = explode( '.', $string); 

Details are now as 2 parts or 6, so you can do:

 if( count( $parts) == 6)){ list( $fistName1, $surName1, $string, $fistName2, $surName2, $gameType) = $parts; } elseif( count( $parts) == 2) { $gameType = $parts[1]; list( $fistName1, $surName1, $string, $fistName2, $surName2) = explode( $parts[0]); } else { echo "Cannot parse"; } 

And now parsing $gameType :)

 if( preg_match( '~^\|(\d+)x(\d+)\|$~', $gameType, $parts)){ $first = $parts[1]; $second = $parts[2]; } elseif( preg_match( '~^G(\d+)R(\d+)$~', $gameType, $parts)){ $first = $parts[1]; $second = $parts[2]; } else { echo "Cannot parse!"; } 

preg_match method:

The second regular expression is deliberately different, so you can see how to write a regular expression that will "eat" the whole name, it doesn't matter if it has 2,3 or 5 parts, and you get used to *? (greedy killer).

 $match = array(); if( preg_match( '~^(\w+)\.(\w+)\.G(\d+)R(\d+)\.(\w+)\.(\w+)\.(\w+)$~', $text, $match)){ // First way } elseif (preg_match( '~^([^\|]+)\|(\d+)x(\d+)\|(.*?)\.(\w+)$~', $text, $match)){ // Second way } else { // Failed to parse } 

Change (more than 2 names)

And if a player can have more than two names (e.g. Armin Van Buuren ), you should go with regexp as follows:

~^([\w.]+)\.G(\d+)R(\d+)\.([\w.]+)\.(\w+)$~

This will match the names in Albert.Einstein , Armin.Van.Buuren (regexp relies on this name, will not contain \d (decimal), so names like Gerold The 3rd do not match).

You should be fine using only: ~^([\w\d.]+)\.G(\d+)R(\d+)\.([\w\d.]+)\.(\w+)$~ , which will also match Gerold The 3rd , and any other name ( \.G(\d+)R(\d+)\. is pretty strict, and you will need to make a really crazy name like G3R01 (like " 3l1t33 guy Herald ") to parse it wrong.

Oh and one more thing, don't forget $name = strtr( $name, '.', ' ') :)

explained by RegExp

  • ~~ - regexp delimiter ; begin end ends regexp; ~regexp~ , it can be almost everything /regexp/ , (regexp)
  • ^ and $ are metacharacters ; ^ start of line / line, $ end of line / line
  • \w escape sequence for any character in a word, like [a-zA-Z]
  • ([\w.]+) - commits the subpatern / match group that contains [a-zA-Z.] at least once. + called quantifier
  • +? - ? (after another quantifier) ​​is called the greedy killer, and this means as little as possible, usually (\w+)a will match (on the ababa line) abab , (\w+?)a will match ab and (\w*?)a will match empty line :)
+4
source

I think this will do it for you.

  /^(\w+)(?:\.| )(\w+)(?:\.| \|)G?(\d+)[x|R](\d+)(?:\.|\| )(\w+)(?:\.| )(\w+)(?:\.| )(\w+)$/ 
  • $ 1 will have the name p1
  • $ 2 will be p1 last name
  • $ 3 will be the game number
  • $ 4 will be presented
  • $ 5 will have the name p2
  • $ 6 will be p2 last name
  • $ 7 is the type of game

If the values ​​of $ n do not make sense, just think of them as elements of the $ results array. The template may be simplified, but I do not have enough time to understand this.

+4
source

You can do it:

 //to get the string without the game type $yourstring = substr($yourstring ,0 ,strlen($yourstring)-4); //separating strings with "." as delimiter $results = explode(".",$yourstring); //checking whether "." was the delimiter if(!strcmp($results[0],$yourstring)) { //if "." was not the delimiter, then split the string with " " //as the delimiter. $results = explode(" ",$yourstring); } //storing them in separate variables. and removing "|" if exists. if( count( $results) == 5){ $results[2] = trim($results[2],"|"); list( $var1, $var2, $var3, $var4, $var5) = $results; } elseif( count( $results) == 4){ $results[1] = trim($results[1],"|"); $results[2] = trim($results[2],"|"); list( $var1, $var2, $var3, $var4) = $results; } else { $results[1] = trim($results[1],"|"); list( $var1, $var2, $var3) = $results; } 

All of your string parts will be split and stored in $results . To make them split the variable, you can use the list function.

+3
source

Source: https://habr.com/ru/post/1395670/


All Articles