How to search in full-text search for flat files using Perl?

We have a Perl-based web application whose data comes from an extensive repository of flat text files. These flat files are placed in the directory of our system, we carefully analyze their insertion of bits of information into the MySQL database and then transfer these files to their archive repository and permanent home (/www/website/archive/*.txt). Now we donโ€™t parse every bit of data from these flat files, and some of the more obscure data elements do not receive the database.

Currently, it is required that users can perform a full-text search of the entire flat repository from a Perl-generated web page and return a list of hits that they could click and open text files for viewing.

What is the most elegant, efficient and non-intensive intensive method allowing this search feature?

+3
source share
4 answers

I would recommend in the following order:

  • Suck the entire document into a MySQL table and use the MySQL full-text search and index functions. I never did this, but MySQL could always handle more than I could pounce on it.

  • Swish-E ( http://swish-e.org/ ) still exists and is designed to build full-text indexes and provide ranked results. I run it for several years and it works very well.

  • File:: Find Perl, , grep -r, . :)

+9

.

, ht://dig .

: , ht://dig - - . . Hyper Estraier, , .

+3

-, . Namazu http://namazu.org. , , Swish-e, ht://dig, .

, forking grep/egrep. , , , Perl, :

open GREP, "find $dirlist -name '$filepattern' | xargs egrep '$textpattern' |"
                                         or die    "grep: $!";
while (<GREP>)  {
       ...
}

: , //etc, grep. find ... | xargs ... , .

+2

I see that someone recommended Lutsen / Plukene. Check out KinoSearch , I've been using it for a year or more in a Catalyst-based project, very pleased with the performance and ease of programming / maintenance.

The disclaimer on this page should be considered for your circumstances, but I can confirm the stability of the module.

0
source

Source: https://habr.com/ru/post/1703916/


All Articles