Ubuntu terminal - using gnu in parallel to reading lines in all files in a folder

I am trying to count lines in all files in a very large folder under Ubuntu.

The files are .gz files and I use

zcat * | wc -l

to count all lines in all files, and it's slow!

I want to use multi-core computing for this task and found this about Gnu parallels,

I tried using this bash command:

parallel zcat * | parallel --pipe wc -l

and the kernels do not all work. I found that running the task can lead to a lot of overhead and tried to use the package with

parallel -X zcat * | parallel --pipe -X wc -l

without improvement

how can I use all cores to count lines in all files in a folder, given that they are all .gz files and should be unpacked before counting lines (no need to compress them after)

Thank!

+4
2

150 000 , , , " ". :

find . -name \*gz -maxdepth 1 -print0 | parallel -0 ...

, , echo , wc stdin :

find ... | parallel -0 'echo {} $(zcat {} | wc -l)'

, , . , parallel -j2 parallel -j4 , .


Ole , , GNU Parallel --tag, , :

find ... | parallel -0 --tag 'zcat {} | wc -l'
+3

, , :

ls *gz | parallel 'zcat {} | wc -l'

:

  • ls *gz gz stdout
  • parallel
  • parallel
  • 'zcat {} | wc -l'

"{}", :

,

, , , zcat.

, , , ,

0

Source: https://habr.com/ru/post/1679798/


All Articles