Scalable Trie Implementation in PHP

Following this tutorial, I met a Trie data structure. Since I programmed in PHP, I have tried to solve this lecture problem . I was able to get the correct answers, but only for small inputs (input number 10 - a file size of 2.82 MB). Obviously, my algorithm does not scale well. It also exceeds the default memory limit of 128 MB for PHP.

My algorithm

Trie has a root node. Each node has a member of "children". I use a standard PHP array to store children. The child key represents a character (I am currently creating a new node for each character, az is lowercase, matching from 0-25), the child value is a reference to another node.

The "weight" element that each node has exists due to a problem . I would like to optimize my code (or even rewrite it from stratch using a different approach) so that it can pass tests for large inputs.

I am interested in a solution for this data structure to work in PHP with large inputs, if possible.

My code

The TrieNode class stores a tree hierarchy.

class TrieNode { // weight is needed for the given problem public $weight; /* TrieNode children, * eg [0 => (TrieNode object1), 2 => (TrieNode object2)] * where 0 stands for 'a', 1 for 'c' * and TrieNode objects are references to other TrieNodes. */ private $children; function __construct($weight, $children) { $this->weight = $weight; $this->children = $children; } /** map lower case english letters to 0-25 */ static function getAsciiValue($char) { return intval(ord($char)) - intval(ord('a')); } function addChild($char, $node) { if (!isset($this->children)) { $this->children = []; } $this->children[self::getAsciiValue($char)] = $node; } function isChild($char) { return isset($this->children[self::getAsciiValue($char)]); } function getChild($char) { return $this->children[self::getAsciiValue($char)]; } function isLeaf() { return empty($this->children); } } 

The Trie class stores the root TrieNode. It can insert and query nodes.

 class Trie { /* root TrieNode stores the first characters */ private $root; function __construct() { $this->root = new TrieNode(-1, []); } function insert($string, $weight) { $currentNode = $this->root; $l = strlen($string); for ($i = 0; $i < $l; $i++) { $char = $string[$i]; if(!$currentNode->isChild($char)) { $n = new TrieNode($weight, null); $currentNode->addChild($char, $n); } $currentNode->weight = max($weight, $currentNode->weight); $currentNode = $currentNode->getChild($char); } } function getNode($string) { $currentNode = $this->root; $l = strlen($string); for ($i = 0; $i < $l; $i++) { $char = $string[$i]; if ($currentNode->isLeaf() || !$currentNode->isChild($char)) { return null; } $currentNode = $currentNode->getChild($char); } return $currentNode; } function getWeight($string) { $node = $this->getNode($string); return is_null($node) ? -1 : $node->weight; } } 

Test code . Parses input and calls the Trie object.

 //MAIN / TEST /* In case the problem page is down: eg INPUT 2 1 hackerearth 10 hackerrank 9 hacker OUTPUT 10 where 2 is the number of inserts, 1 is the number of queries "string number" is the string to insert and its "weight" "hacker" is the string to query 10 is maximum the weight of the queried string (hacker -> 10) */ $trie = new Trie(); $handle = fopen('test.txt', 'r'); //$handle = STDIN; // <- this is for the online judge list($n, $q) = fscanf($handle, "%d %d"); for ($i = 0; $i < $n; $i++) { // insert data list($s, $weight) = fscanf($handle, "%s %d"); $trie->insert($s, $weight); } for ($i = 0; $i < $q; $i++) { // query data $query = trim(strval(fgets($handle))); echo $trie->getWeight($query) . PHP_EOL; } fclose($handle); 

Renouncement

The algorithm failed 6 tests out of 16

+5
source share
2 answers

Below is the code with the following optimizations -

Removed all unnecessary condition checks, for example

  • It is not necessary to verify that the node is a leaf, because if the node does not have a child specified by char, then it does not matter if it is a leaf or not.
  • No need to check if {children} is initialized every time you add a child to a node. Removed this check, initialized by {children}, to clear the array in the constructor itself.

Removed the function {getAsciiValue} instead, using a simple associative array. Also, changing the value of {char} to ascii was transferred from the TrieNode class to Trie, so we don’t need to translate it several times

After this optimization, I came to the following minimum solution, but it also could not pass test No. 10. After reading the array in PHP, I found out that PHP does not implement the array like other compiled languages, and any array in PHP is just an ordered hash map, and because of this, the array does not support constant time operations. fooobar.com/questions/121357 / ...

Also using SplFixedArray , but did not help, because it is an object and has the cost of creating an instance. This might help if you try to use a large array to store the entire Trie. You can try to implement a solution for using SplFixedArray to store the entire Trie and check if you can pass the test number 10.

 <?php /* * Read input from stdin and provide input before running code fscanf(STDIN, "%s\n", $name); echo "Hi, ".$name; */ class TrieNode { // weight is needed for the given problem public $weight; /* TrieNode children, * eg [0 => (TrieNode object1), 2 => (TrieNode object2)] * where 0 stands for 'a', 2 for 'c' * and TrieNode objects are references to other TrieNodes. */ private $children; function __construct($weight) { $this->weight = $weight; $this->children = []; } function addChild($char, $node) { $this->children[$char] = $node; } function isChild($char) { return isset($this->children[$char]); } function getChild($char) { return $this->children[$char]; } } class Trie { /* root TrieNode stores the first characters */ private $root; function __construct() { $this->root = new TrieNode(-1); } static $asciiValues = array( "a" => 0, "b" => 1, "c" => 2, "d" => 3, "e" => 4, "f" => 5, "g" => 6, "h" => 7, "i" => 8, "j" => 9, "k" => 10, "l" => 11, "m" => 12, "n" => 13, "o" => 14, "p" => 15, "q" => 16, "r" => 17, "s" => 18, "t" => 19, "u" => 20, "v" => 21, "w" => 22, "x" => 23, "y" => 24, "z" => 25 ); function insert($string, $weight) { $currentNode = $this->root; $l = strlen($string); for ($i = 0; $i < $l; $i++) { $char = self::$asciiValues[$string[$i]]; $currentNode->weight = max($weight, $currentNode->weight); if($currentNode->isChild($char)) { $childNode = $currentNode->getChild($char); } else { $childNode = new TrieNode($weight); $currentNode->addChild($char, $childNode); } $currentNode = $childNode; } } function getNodeWeight($string) { $currentNode = $this->root; $l = strlen($string); for ($i = 0; $i < $l; $i++) { $char = self::$asciiValues[$string[$i]]; if (!$currentNode->isChild($char)) { return -1; } $currentNode = $currentNode->getChild($char); } return $currentNode->weight; } } $trie = new Trie(); //$handle = fopen('test.txt', 'r'); $handle = STDIN; // <- this is for the online judge list($n, $q) = fscanf($handle, "%d %d"); for ($i = 0; $i < $n; $i++) { // insert data list($s, $weight) = fscanf($handle, "%s %d"); $trie->insert($s, $weight); } for ($i = 0; $i < $q; $i++) { // query data //$query = trim(strval(fgets($handle))); $query = trim(strval(fgets($handle))); echo $trie->getNodeWeight($query) . PHP_EOL; } fclose($handle); ?> 
+1
source

After making some settings and modifications, I was able to make this thing work for all test cases, with the exception of one test case - timeout,

Here is all the code that will work successfully for all test cases except test case 10.

 class TrieNode { // weight is needed for the given problem public $weight; /* TrieNode children, * eg [0 => (TrieNode object1), 2 => (TrieNode object2)] * where 0 stands for 'a', 1 for 'c' * and TrieNode objects are references to other TrieNodes. */ private $children; function __construct($weight, $children) { $this->weight = $weight; $this->children = $children; } /** map lower case english letters to 0-25 */ static function getAsciiValue($char) { return intval(ord($char)) - intval(ord('a')); } function addChild($char, $node) { if (!isset($this->children)) { $this->children = []; } $this->children[self::getAsciiValue($char)] = $node; } function isChild($char) { return isset($this->children[self::getAsciiValue($char)]); } function getChild($char) { return $this->children[self::getAsciiValue($char)]; } function isLeaf() { return empty($this->children); } } class Trie { /* root TrieNode stores the first characters */ private $root; function __construct() { $this->root = new TrieNode(-1, []); } function insert($string, $weight) { $currentNode = $this->root; $l = strlen($string); for ($i = 0; $i < $l; $i++) { $char = $string[$i]; if(!$currentNode->isChild($char)) { $n = new TrieNode($weight, null); $currentNode->addChild($char, $n); } $currentNode->weight = max($weight, $currentNode->weight); $currentNode = $currentNode->getChild($char); } } function getNode($string) { $currentNode = $this->root; if (empty($currentNode) || !isset($currentNode)) { return null; } $l = strlen($string); for ($i = 0; $i < $l; $i++) { $char = $string[$i]; if (empty($currentNode) || $currentNode->isLeaf() || !$currentNode->isChild($char)) { return null; } $currentNode = $currentNode->getChild($char); if (empty($currentNode)) { return null; } } return $currentNode; } function getWeight($string) { $node = $this->getNode($string); return is_null($node) ? -1 : $node->weight; } } $trie = new Trie(); //$handle = fopen('test.txt', 'r'); $handle = STDIN; // <- this is for the online judge list($n, $q) = fscanf($handle, "%d %d"); for ($i = 0; $i < $n; $i++) { // insert data list($s, $weight) = fscanf($handle, "%s %d"); $trie->insert($s, $weight); } for ($i = 0; $i < $q; $i++) { // query data $query = trim(strval(fgets($handle))); echo $trie->getWeight($query) . PHP_EOL; } fclose($handle); 

I will try to add a few more checks to reduce the calculation cycles made by this program.

+2
source

Source: https://habr.com/ru/post/1265043/


All Articles