Fast file search algorithm for IP addresses

Question

What is the fastest way to find if an IP address exists in a file that contains IP addresses sorted as:

219.93.88.62 219.94.181.87 219.94.193.96 220.1.72.201 220.110.162.50 220.126.52.187 220.126.52.247 

Limitations

  • No database (e.g. MySQL, PostgreSQL, Oracle, etc.)
  • Incorrect preprocessing allowed (see features section)
  • It would be nice not to upload a file every request (131Kb)
  • Using less than 5 megabytes of disk space
  • No additional PHP modules

File details

  • One IP address per line
  • 9500+ lines

Possible solutions

  • Create a directory hierarchy ( base tree ?), Then use is_dir() (unfortunately this uses 87 megabytes)
+4
source share
4 answers

Scanning a file line by line to find an IP seems to be sick if you have 9000 non-share checks before you go to 232.0.17.1

Is your file limited to one file? for example, let's say this list is prohibited by IP addresses, and you just want to see if it is on the list.

What to do if you made a DIR to contain multiple files:

 BannedIPs +- 0.ips +- 1.ips +- 37.ips +- 123.ips +- 253.ips +- 254.ips 

Each file contains only IP addresses starting with this number.

If you are lucky enough to even have distribution ... you will have 256 files, but each will have only 37 entries.

Thus, if you want to test: 232.0.17.1 , you look in the 232.ips file and scan it.

+3
source

Since your file stores IP addresses in sorted order, you can quickly find a specific IP address in O (log (n)) time using binary search.

If you want to speed it up even longer, you can cache, for example, every 100th line in memory and use a binary search in memory first, then you know what part of the file you need to read to complete the search.

Having said that 131kB is really not so much, so the easiest and fastest solution is to buy more memory and cache the entire file in memory in a hash table.

+3
source

EDIT I did not notice the php tag, I don’t know if the following type of thing is possible in this language. But I will leave it for the idea anyway.

The IPv4 address is represented as a 32-bit number, so I just create an int32 array, translate your addresses into 'ints` with the following Puedon-ish psuedocode:

 x = 0 i = 24 s = '111.222.333.444' for part in s.split('.'): x += part.toint() << i i -= 8 IPlist.append(x) 

Then you can get the input address, convert it to int in the same way, and do a binary search in the array.

For strings of ~ 10 k, the array will take ~ 40 kbytes.

+3
source

Maybe not fast, but I would try this: if the IP address file has not changed much, read the file in the array and cache it (possibly Memcache) and search there for each request.

+1
source

Source: https://habr.com/ru/post/1307240/


All Articles