The file with the largest number of lines in the NOT bytes directory

Question

The file with the largest number of lines in the NOT bytes directory

I am trying to make wc -l an entire directory and then display the file name in an echo with the number of lines.

To add to my disappointment, the directory should come from the argument passed. So without looking stupid, can someone first tell me why the simple wc -l $1 does not give me a line count for the directory that I enter in the argument? I know that I do not understand him completely.

Also, I need a check if the argument is not a directory or if more than one argument exists.

As always, you are great.

+4

linux scripting bash shell

Looking2learned Jun 12 '13 at 5:24

source share

6 answers

I am trying to make wc -l an entire directory and then display filename in an echo with the number of lines.

You can do find in the directory and use the -exec option to run wc -l . Something like that:

 $ find ~/Temp/perl/temp/ -exec wc -l '{}' \; wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory 11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx 25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm 12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx 14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx 22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx 27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx 7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm 18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm 26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm 12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx 14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx 16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx 24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx 33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm

+1

jaypal singh Jun 12 '13 at 6:00

source share

Good question!

I saw the answers. Some of them are pretty good. find ...|xrags is my most preferred. It could be simplified using the syntax find ... -exec wc -l {} + . But there's a problem. When the command line buffer is full, wc -l ... is called and each time the <number> total is a printer. Since wc has no argument to disable this function, wc must be overridden. To filter these lines with grep is not good:

So my complete answer

 #!/usr/bin/bash [ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1 [ ! -d "$1" ] && echo "Not dir">&2 && exit 1 find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} +

Or using less temporary space, but a bit larger code in awk :

 find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} +

Miscellaneous

If it should not be called for subdirectories, add -maxdepth 1 to -type in find .
This is pretty fast. I was afraid that it would be much slower than the find ... wc + version, but for a directory containing 14770 files (in several subdirectories), the wc version runs 3.8 seconds and the awk version runs 5.2 seconds.
awk and wc treat strings \n differently. The last line ending without \n does not count as wc . I prefer to consider it awk .
It does not print empty files

+1

Truey Jun 12 '13 at 7:23

source share

Is this what you want?

 > find ./test1/ -type f|xargs wc -l 1 ./test1/firstSession_cnaiErrorFile.txt 77 ./test1/firstSession_cnaiReportFile.txt 14950 ./test1/exp.txt 1 ./test1/test1_cnaExitValue.txt 15029 total

so your directory, which is an argument, should be here:

 find $your_complete_directory_path/ -type f|xargs wc -l

0

Vijay Jun 12 '13 at 6:04

source share

To find the file with most of the lines in the current directory and its subdirectories, zsh :

 lines() REPLY=$(wc -l < "$REPLY") wc -l -- **/*(D.nO+lined[1])

Defines the lines function that will be used as the glob sorting function, which returns in $REPLY number of lines of the file whose path is specified in $REPLY .

Then we use zsh recursive substitution <T26> to find the correct files ( . ), Numerically ( n ) sorted in reverse order ( O ) with the lines function ( +lines ), and select the first [1] . ( D to enable dotfiles and move dots).

Doing this with standard utilities is a bit tricky if you don't want to make assumptions about what character file names may contain (e.g. newline, space ...). With GNU tools, which can be found on most Linux distributions, this is a little easier since they can work with complete NUL lines:

 find . -type f -exec sh -c ' for file do size=$(wc -c < "$file") && printf "%s\0" "$size:$file" done' sh {} + | tr '\n\0' '\0\n' | sort -rn | head -n1 | tr '\0' '\n'

Or with zsh or GNU bash syntax:

 biggest= max=-1 find . -type f -print0 | { while IFS= read -rd '' file; do size=$(wc -l < "$file") && ((size > max)) && max=$size biggest=$file done [[ -n $biggest ]] && printf '%s\n' "$max: $biggest" }

0

Stephane chazelas Jun 12 '13 at 7:13

source share

Here, which works for me with git bash (mingw32) under windows:

 find . -type f -print0| xargs -0 wc -l

Files and line counts in the current directory and subdirectories are listed here. You can also direct the output to a text file and import it into Excel if necessary:

 find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt

0

Ali cheaito Nov 24 '15 at 15:04

source share

paxdiablo · Accepted Answer · 2013-06-12T05:31:24+0000

wc works with files, not directories, so if you want the number of words to be indicated in all files in a directory, you start with:

 wc -l $1/*

With various ways to get rid of the total, sort it and extract only the largest, you can get something like (split into several lines for readability, but should be entered on one line):

 pax> wc -l $1/* 2>/dev/null | grep -v ' total$' | sort -n -k1 | tail -1l 2892 target_dir/big_honkin_file.txt

As for the check, you can check the number of parameters passed to your script, with something like:

 if [[ $# -ne 1 ]] ; then echo 'Whoa! Wrong parameteer count' exit 1 fi

and you can check if there is a directory with:

 if [[ ! -d $1 ]] ; then echo 'Whoa!' "[$1]" 'is not a directory' exit 1 fi

The file with the largest number of lines in the NOT bytes directory

More articles: