PHP and NLP: nested bracket (analyzer result) in an array?

I would like to turn the text with nested brackets into a nested array. The following is an example output from an NLP analyzer:

(TOP (S (NP (PRP I)) (VP (VBP love) (NP (NP (DT a) (JJ big) (NN bed)) (PP (IN of) (NP (NNS roses))))) (. .))) 

(orig: I like the big bed of roses.)

I would like to turn this into a nested array so that it looks like this:

 TOP S NP PRP I VP VBP Love 

and etc.

Found php curly braces in an array , but this is not a nested array

+6
source share
2 answers

code explanation:

 <?php class ParensParser { // something to keep track of parens nesting protected $stack = null; // current level protected $current = null; // input string to parse protected $string = null; // current character offset in string protected $position = null; // start of text-buffer protected $buffer_start = null; public function parse($string) { if (!$string) { // no string, no data return array(); } if ($string[0] == '(') { // killer outer parens, as they're unnecessary $string = substr($string, 1, -1); } $this->current = array(); $this->stack = array(); $this->string = $string; $this->length = strlen($this->string); // look at each character for ($this->position=0; $this->position < $this->length; $this->position++) { switch ($this->string[$this->position]) { case '(': $this->push(); // push current scope to the stack an begin a new scope array_push($this->stack, $this->current); $this->current = array(); break; case ')': $this->push(); // save current scope $t = $this->current; // get the last scope from stack $this->current = array_pop($this->stack); // add just saved scope to current scope $this->current[] = $t; break; /* case ' ': // make each word its own token $this->push(); break; */ default: // remember the offset to do a string capture later // could've also done $buffer .= $string[$position] // but that would just be wasting resources… if ($this->buffer_start === null) { $this->buffer_start = $this->position; } } } return $this->current; } protected function push() { if ($this->buffer_start !== null) { // extract string from buffer start to current position $buffer = substr($this->string, $this->buffer_start, $this->position - $this->buffer_start); // clean buffer $this->buffer_start = null; // throw token into current scope $this->current[] = $buffer; } } } $string = '(TOP (S (NP (PRP I)) (VP (VBP love) (NP (NP (DT a) (JJ big) (NN bed)) (PP (IN of) (NP (NNS roses))))) (. .)))'; $p = new ParensParser(); $result = $p->parse($string); var_dump($result); 
+18
source

A brilliant answer! NB, in order to catch the end lines (for example, “c” in “a (b) c”), you need to change the end of the class

  default: // remember the offset to do a string capture later // could've also done $buffer .= $string[$position] // but that would just be wasting resources… if ($this->buffer_start === null) { $this->buffer_start = $this->position; } } } // catch any trailing text if ($this->buffer_start < $this->position) { $this->push(); } return $this->current; 

thanks

+1
source

Source: https://habr.com/ru/post/900205/


All Articles