How to match a custom string to a specified integer

I am engaged in financial activities. I have a set of conditional characters, but they have a very clear outline: it consists of two characters AB , AC AD and the current month, which is a four-digit number: 1503 , 1504 , 1505 . Here are some examples:

 AB1504 AB1505 AC1504 AC1505 AD1504 AD1505 .... 

Since these lines are so well thought out, I want to map (hash) each line to a unique integer so that I can use the integer as an array index for quick access, since I have many exemptions inside my system and std::unordered_map or whatever another hash map is not fast enough. I have tests showing that a shared hash map is a latency level of one hundred nanoseconds, while array indexing is always less than 100 nanometers. my ideal case would be, for example, to AB1504 to an integer 1 , AB1505 maps to 2 ...., then I can create an array inside to quickly access the information associated with these characters. I am trying to figure out some hashing algorithms or other methods that can achieve my goal, but could not find out. Do you have any suggestions on this issue?

+6
source share
4 answers

You can consider a string as a representation of a number base and convert it to an integer. For instance:

 AC1504: A (range: AZ) C (range: AZ) 15 (range: 0-99) 04 (range: 1-12) 

Remove the parts; then the hash function may be

 int part1, part2, part3, part4; ... part1 -= 'A'; part2 -= 'A'; part4 -= 1; return (((part1 * 26 + part2) * 100 + part3) * 12 + part4; 
+1
source

If you parse a string in the form of a mixed base number, first 2 base-26 digits, and then 4 bases-10 digits, you will quickly get a unique index for each row. The only problem is that if you can get a sparsely populated array.

You can always change the order of the digits when calculating the index to minimize the problem mentioned above.

Since the numbers are actually months, I would calculate the number of months from the first record and multiply them by the 2-digit base-26 number from the prefix.

Hope you can do some of this by typing my tablet at the moment .: D

0
source

The following values ​​must be represented by a 32-bit integer:

 XYnnnn => (26 * X + Y) * 10000 + nnnn 

Here X and Y take values ​​in the range [0, 26), and n takes values ​​in the range [0, 10].

You have a total of 6,760,000 represented values, so if you want to associate a small amount of data with it (for example, a counter or a pointer), you can simply create a flat array where each character occupies one record of the array.

0
source

I assume the format is "AAyymm", where A is the uppercase character yy is a two-digit year and mm is a two-digit month.

Therefore, you can match it with bits 10 (AA) + Y (yy) + 4 (mm). where Y = 32 - 10 - 4 = 18 bits for a 32-bit representation (or 262144 years). With this, you can represent the format as an integer by moving the characters there and moving the year and month pairs there, after converting them to an integer.

Note: there will always be spaces in the binary representation. Here is a 5 + 5 bit representation for characters (values ​​6 + 6) and in a 4-bit month representation (4 values)

To avoid spaces, change the view to ABmmmm, whether the AB pairs were represented by 26 * A + B, and mmmm is the month relative to some zero month for some year (which spans 2 ^ 32/1024 / 12 = 349525 years - with 32 bits )

However, you may consider splitting characters and time. Combining two values ​​in one field is usually difficult (it may be a good storage format, but not a good "program data format").

0
source

Source: https://habr.com/ru/post/987378/


All Articles