Overcoming Bit Algorithm Search Pattern Length

I am new to approximate string matching.

I am learning to use the Bitap algorithm , but so far its limited template length has bothered me. I work with Flash and I have 32-bit unsigned integers and a double-precision floating-point IEEE-754 type that can allocate up to 53 bits for integers. However, I would prefer to have a fuzzy matching algorithm that can handle longer patterns than 50 characters.

The Wikipedia page in the Bitap algorithm mentions libbitap, which supposedly demonstrates an unlimited implementation of the length of the algorithm, but it's hard for me to get the idea from its sources.

Do you have any suggestions on how to generalize Bitap to patterns of unlimited length or on another algorithm that can fuzzy match needle strings near a suggested location in a haystack?

+3
source share
1 answer

There is a pretty realistic implementation of this algorithm, available in Google code . Give it a try. Although I can’t figure out how to get the exact location (beginning and end of a point in the text) of a fuzzy match. If you have an idea how to get both the starting and ending points, please share.

+2
source

Source: https://habr.com/ru/post/1707561/


All Articles