PHP preg_split: split the line with other lines

I want to break a large string into a series of words.

eg.

$splitby = array('these','are','the','words','to','split','by'); $text = 'This is the string which needs to be split by the above words.'; 

Then the results will be as follows:

 $text[0]='This is'; $text[1]='string which needs'; $text[2]='be'; $text[3]='above'; $text[4]='.'; 

How can i do this? Is preg_split better, or is there a more efficient method? I would like it to be as fast as possible, since I will split hundreds of MB files.

+4
source share
4 answers

I don't think using prere regex is necessary ... if it really separates the words you need.

You could do something similar, and breakpoints will see if it will be faster / better ...

 $splitby = array('these','are','the','words','to','split','by'); $text = 'This is the string which needs to be split by the above words.'; $split = explode(' ', $text); $result = array(); $temp = array(); foreach ($split as $s) { if (in_array($s, $splitby)) { if (sizeof($temp) > 0) { $result[] = implode(' ', $temp); $temp = array(); } } else { $temp[] = $s; } } if (sizeof($temp) > 0) { $result[] = implode(' ', $temp); } var_dump($result); /* output array(4) { [0]=> string(7) "This is" [1]=> string(18) "string which needs" [2]=> string(2) "be" [3]=> string(5) "above words." } 

The only difference from your conclusion is the last word, because "words".! = "Word", and this is not a divided word.

+3
source

This should be reasonably effective. However, you can test some files and report performance.

 $splitby = array('these','are','the','words','to','split','by'); $text = 'This is the string which needs to be split by the above words.'; $pattern = '/\s?'.implode($splitby, '\s?|\s?').'\s?/'; $result = preg_split($pattern, $text, -1, PREG_SPLIT_NO_EMPTY); 
+7
source

preg_split can be used as:

 $pieces = preg_split('/'.implode('\s*|\s*',$splitby).'/',$text,-1,PREG_SPLIT_NO_EMPTY); 

Look it up

+4
source

Since the words in your $ splitby array are not a regular expression, maybe you can use

str_split

-1
source

Source: https://habr.com/ru/post/1380506/


All Articles