How to efficiently use a hash address

This is an interview question. I was thinking of some solution, such as a multi-track, but could not find anything elegant. Please suggest a good method.

Question: You have 10 million IP addresses. (IPv4 4 byte addresses). Create a hash function for these IP addresses.

Hint: using IP itself as a key is a bad idea because there will be a lot of lost space

+6
source share
3 answers

Interestingly, such an interesting question had no interesting answer (sorry for the tautology).

If you consider this as a theoretical question, then you need this link (there is even a hash function superfast written for you and ready to work):

http://www.kfki.hu/~kadlec/sw/netfilter/ct3/

The practical question may be different. If your hash table is a reasonable size, you still have to handle conflicts (with linked lists). So ask yourself, which use case will take place at the end? If your code will work in some isolated ecosystem, and the IP address will be abcd , c and d will be the most volatile numbers, and d will not be empty (if you do not work with networks) therefore a hash table of 64 thousand buckets and cd like hash may well be satisfactory?

Another use case is to monitor TCP connections, where the client uses an ephemeral port that is randomly assigned by the kernel (isn't that perfect for hashing?). The problem is a limited range: something like 32768-61000, which makes the least significant byte more random than the most significant byte. So you can XOR the most significant byte with the most volatile byte in the IP address, which can be zerro ( c ) and use it as a hash in your 64K table.

+3
source

Since your input is random and the table size is smaller, the address space of any hash function that you create will have its own set of pathological data, which will make your hash function bad. I think the interviewer wants to know your knowledge of the existing hash functions that are used as standards.

Below are a few of these hash functions:

  • MD5

  • SHA-1, SHA-2

Why these functions work better than other hash functions, because their set of pathological data is difficult to find without using brute force algorithms. Therefore, if you have something good than a donor, tell your interviewer (you can get a patent for it and get a job at Google).

For Hashing ip addresses, use MD5 or SHA on it and crop the table size, and you're done.

Note. . The size of the table should be simple to prevent bad hashing.

0
source

I have the same question before.

To solve this problem, you must share your data. We know that the ip address is a consequence.

  • table1 from 0.0.0.0 to 0.0.0.127 (all located in New York City).
  • table2 from 0.0.0.128 to 0.0.0.255 (all located in New York2).
  • ....

Then create such a map.

  • 0.0.0.0 ~ 0.0.0.127 β†’ address1
  • 0.0.0.127 ~ 0.0.0.255 β†’ address2
  • ......

Then, to get the IP address, just get the value from the map;

Note: all the data in the database, I don’t think it’s an expensive place to get an address in 1ce, you have to cost a few places to optimize the speed.

0
source

Source: https://habr.com/ru/post/958188/


All Articles