Effective php iterator search

I have a custom iterator (TokenIterator, to be precise, which iterates, well, tokenized PHP code). Elements are simple objects ("real estate bags" with the addition of some normalization methods)

I need to implement a search function, which should find if 1. one iterator contains another or 2. two (or more) iterators overlap (with some parameterization).

I am currently using the naive approach to finding a double loop (1) - O (NxM) and (2) not yet implemented.

Before embarking on the redefinition of truly intelligent string search algorithms, I would like to know if there is any efficient implementation of this? Maybe something is deeply immersed in a framework or shared library for reuse? And which algorithm would be most suitable here?

+4
source share
2 answers

The first thing that comes to mind is that you are talking about set operations, for which iterators may not be the best solution.

I do not know if there is an existing solution for your problem, but as a general solution, I would use hash tables. For example, build a hash table using the tokens of the first set (I will call it set from now on, because I think Iterator is not the best word), and you can do it in Theta (N), and then try to insert another set to the same hash table. The first time you encounter collisions, you will know that there is overlap. Of course, this works well if the hash space is wide and the hash function guarantees a small number of collisions, however, you can always encode some kind of workaround.

Given that PHP has associative arrays (which are a form of hash tables), you can also create an array that has tokens as keys, which can again be done in Theta (N), and then use array_key_exists. It is possible that array_key_exists is nothing more than a linear scan of the key list, since I am not familiar with the internal components of PHP, but I am pretty sure that if associative arrays are implemented as hash tables, they should be implemented much more efficiently than linear scanning.

+3
source

If your iterators can be passed into arrays, you can use array_diff and array_intersect . Otherwise, you must implement what these functions do under the hood - go through your structures and compare them. Since the data you are repeating is not sorted, and you know nothing about it, you have no other choice.

+1
source

Source: https://habr.com/ru/post/1333275/


All Articles