The fastest approach to searching the contents of a directory file

I have a directory containing files for the program users that I have. This directory contains about 70 thousand Json files.

The current search method uses globand foreach. This gets pretty slow and clogs the server. Is there a good way to search these files more efficiently? I run this on an Ubuntu 16.04 machine and I can use it if necessary exec.

Update:

Abstracts are json files, and each file must be opened to check if it contains a search query or not. Looping files is pretty fast, but when you need to open each file, it will take quite a while.

They cannot be indexed using SQL or memcached, as I use memcached for some other things.

+1
source share
2 answers

As you expected, in order to make this most effective search possible, you need to transfer the task to a tool designed for this purpose.

I say: go beyondgrep and see what is even better thanack . Also, see ag, then set for ripgrepas the best of its kind in the city.


Experiment

I experimented a bit with ackon a laptop with a low specific gravity. I was looking for an existing class name in the 19,501 files . Here are the results:

$ cd ~/Dev/php/packages
$ ack -f | wc -l 
19501

$ time ack PHPUnitSeleniumTestCase | wc -l
10
ack PHPUnitSeleniumTestCase  7.68s user 2.99s system 21% cpu 48.832 total
wc -l  0.00s user 0.00s system 0% cpu 48.822 total

, ag. :

$ time ag PHPUnitSeleniumTestCase | wc -l
10
ag PHPUnitSeleniumTestCase  0.24s user 0.98s system 13% cpu 9.379 total
wc -l  0.00s user 0.00s system 0% cpu 9.378 total

, ripgrep. :

$ time rg PHPUnitSeleniumTestCase | wc -l
10
rg PHPUnitSeleniumTestCase  0.44s user 0.27s system 19% cpu 3.559 total
wc -l  0.00s user 0.00s system 0% cpu 3.558 total

, , .


PS ripgrep post, , ripgrep , {grep, ag, git grep, ucg, pt, sift}. , .

+2

, SSD HDD , .

HDD

PHP, -. SSD RAM-, .

, SSD. , , ~ 70 200 IOPS ( - , ). -, fstat, filemtime, file_exists .., , (file_get_contents() ..).

HDD -, IOPS. , , ( , ). , , (, , xargs ..), .

, , . , , , . , ( , ) , . IOPS .

, - PHP pthreads, . xargs ( -P) , . shell_exec() PHP.

SSD

, , , , I/O .

+1

Source: https://habr.com/ru/post/1715642/


All Articles