Removing valid spaces in PHP using parser tokens

I am trying to make a simple script that will remove all the extra spaces from a PHP file / line.

I managed to parse the string using tokens, but I don’t see a good method for removing extra spaces.

For instance,

function test() { return TRUE; } 

it should be

 function test(){return TRUE;} 

and NOT

 functiontest(){returnTRUE;} 

You will end up with the latest version if you simply remove the T_WHITESPACE token.

Is there something I am missing to remove spaces, but keep spaces after things like "function" and "return". Thanks!

+4
source share
3 answers
 $newSource = ''; foreach (token_get_all($source) as $i => $token) { if (!is_array($token)) { $newSource .= $token; } if ($token[0] == T_WHITESPACE) { if ( isset($tokens[$i - 1]) && isset($tokens[$i + 1]) && is_array($tokens[$i - 1]) && is_array($tokens[$i + 1]) && isLabel($tokens[$i - 1][1]) && isLabel($tokens[$i + 1][1]) ) { $newSource .= ' '; } } else { $newSource .= $token[1]; } } function isLabel($str) { return preg_match('~^[a-zA-Z0-9_\x7f-\xff]+$~', $str); } 

Removing spaces is always allowed, unless there is a LABEL on either side of it. I check this and do not add anything, not a single whitespace character.

There is only another case that I know of, there is a gap: T_END_HEREDOC must follow either ; , or \n . Sealing or removing space is not allowed here. So, if this is important to you, you can just add this;)

+3
source

Well, T_WHITESPACE can be a space or a newline, etc. Thus, one of the trivial approaches would be to automatically replace all instances of T_WHITESPACE with a new one consisting of exactly one space.

But for a more sensible method, just go to the list of parser tokens and find out which ones should have spaces after it and which shouldn't (something like this):

 foreach ($tokens as $k => $val) { if (is_array($val) && $val[0] == T_WHITESPACE) { if (!is_array($tokens[$k - 1])) { //remove this space } else { switch ($tokens[$k - 1][0]) { case T_ABSTRACT: case T_FUNCTION: //.. other keeps here: continue; break; default: //remove the space } } } } 

And one more note, do not do this for performance. If you use the OPCODE cache (e.g. APC), you will not see any benefit to a lot of work. If you are not using it, why not be?

+1
source

Your efforts are in vain.

 php -w 

Allows you to already remove scripts without spaces. It uses more sophisticated logic to remove spaces from the token stream.
Here's the zend_strip() function found in zend_highlight.c :

 while ((token_type=lex_scan(&token TSRMLS_CC))) { switch (token_type) { case T_WHITESPACE: if (!prev_space) { zend_write(" ", sizeof(" ") - 1); prev_space = 1; } /* lack of break; is intentional */ case T_COMMENT: case T_DOC_COMMENT: token.type = 0; continue; case T_END_HEREDOC: zend_write(LANG_SCNG(yy_text), LANG_SCNG(yy_leng)); efree(token.value.str.val); /* read the following character, either newline or ; */ if (lex_scan(&token TSRMLS_CC) != T_WHITESPACE) { zend_write(LANG_SCNG(yy_text), LANG_SCNG(yy_leng)); } zend_write("\n", sizeof("\n") - 1); prev_space = 1; token.type = 0; continue; default: zend_write(LANG_SCNG(yy_text), LANG_SCNG(yy_leng)); break; } if (token.type == IS_STRING) { switch (token_type) { case T_OPEN_TAG: case T_OPEN_TAG_WITH_ECHO: case T_CLOSE_TAG: case T_WHITESPACE: case T_COMMENT: case T_DOC_COMMENT: break; default: efree(token.value.str.val); break; } } prev_space = token.type = 0; } 
+1
source

Source: https://habr.com/ru/post/1338567/


All Articles