The fastest way to find vocabulary strings in text

Question

The fastest way to find vocabulary strings in text

I have a text file and a dictionary. The dictionary consists of a list of exactly 8-digit long words. I look through a text file and look through the dictionary every 8 characters ("sliding window").

I am currently using the python dictionary data structure as a lookup table. He amortized the search time 0 (1), but I wonder if there are faster algorithms / data structures that use the specific nature / structure of the problem.

+4

string algorithm

Roy Jul 22 '15 at 10:20

source share

2 answers

I think you can use full-text search to do this, e.g. Apache Sorl, Elastich Search.

But you can use http://lunrjs.com/ for the client side.

0

Portfolio vietnam Jul 22 '15 at 10:41

source share

Bytemain · Accepted Answer · 2015-07-22T12:15:24+0000

You can try aho-corasick several template templates. It creates a finite state machine with the first and very first search for the first attachment of the longest prefix, which is also the suffix of the dictionary string. You can try my php implementation at https://phpahocorasick.codeplex.com . It also enhances the wildcard search algorithm.

The fastest way to find vocabulary strings in text

More articles: