The file with the largest number of lines in the NOT bytes directory

I am trying to make wc -l an entire directory and then display the file name in an echo with the number of lines.

To add to my disappointment, the directory should come from the argument passed. So without looking stupid, can someone first tell me why the simple wc -l $1 does not give me a line count for the directory that I enter in the argument? I know that I do not understand him completely.

Also, I need a check if the argument is not a directory or if more than one argument exists.

As always, you are great.

+4
source share
6 answers

wc works with files, not directories, so if you want the number of words to be indicated in all files in a directory, you start with:

 wc -l $1/* 

With various ways to get rid of the total, sort it and extract only the largest, you can get something like (split into several lines for readability, but should be entered on one line):

 pax> wc -l $1/* 2>/dev/null | grep -v ' total$' | sort -n -k1 | tail -1l 2892 target_dir/big_honkin_file.txt 

As for the check, you can check the number of parameters passed to your script, with something like:

 if [[ $# -ne 1 ]] ; then echo 'Whoa! Wrong parameteer count' exit 1 fi 

and you can check if there is a directory with:

 if [[ ! -d $1 ]] ; then echo 'Whoa!' "[$1]" 'is not a directory' exit 1 fi 
+6
source

I am trying to make wc -l an entire directory and then display filename in an echo with the number of lines.

You can do find in the directory and use the -exec option to run wc -l . Something like that:

 $ find ~/Temp/perl/temp/ -exec wc -l '{}' \; wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory 11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx 25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm 12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx 14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx 22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx 27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx 7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm 18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm 26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm 12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx 14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx 16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx 24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx 33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm 
+1
source

Good question!

I saw the answers. Some of them are pretty good. find ...|xrags is my most preferred. It could be simplified using the syntax find ... -exec wc -l {} + . But there's a problem. When the command line buffer is full, wc -l ... is called and each time the <number> total is a printer. Since wc has no argument to disable this function, wc must be overridden. To filter these lines with not good:

So my complete answer

 #!/usr/bin/bash [ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1 [ ! -d "$1" ] && echo "Not dir">&2 && exit 1 find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} + 

Or using less temporary space, but a bit larger code in :

 find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} + 

Miscellaneous

  • If it should not be called for subdirectories, add -maxdepth 1 to -type in find .
  • This is pretty fast. I was afraid that it would be much slower than the find ... wc + version, but for a directory containing 14770 files (in several subdirectories), the version runs 3.8 seconds and the version runs 5.2 seconds.
  • and treat strings \n differently. The last line ending without \n does not count as . I prefer to consider it awk .
  • It does not print empty files
+1
source

Is this what you want?

 > find ./test1/ -type f|xargs wc -l 1 ./test1/firstSession_cnaiErrorFile.txt 77 ./test1/firstSession_cnaiReportFile.txt 14950 ./test1/exp.txt 1 ./test1/test1_cnaExitValue.txt 15029 total 

so your directory, which is an argument, should be here:

 find $your_complete_directory_path/ -type f|xargs wc -l 
0
source

To find the file with most of the lines in the current directory and its subdirectories, zsh :

 lines() REPLY=$(wc -l < "$REPLY") wc -l -- **/*(D.nO+lined[1]) 

Defines the lines function that will be used as the glob sorting function, which returns in $REPLY number of lines of the file whose path is specified in $REPLY .

Then we use zsh recursive substitution <T26> to find the correct files ( . ), Numerically ( n ) sorted in reverse order ( O ) with the lines function ( +lines ), and select the first [1] . ( D to enable dotfiles and move dots).

Doing this with standard utilities is a bit tricky if you don't want to make assumptions about what character file names may contain (e.g. newline, space ...). With GNU tools, which can be found on most Linux distributions, this is a little easier since they can work with complete NUL lines:

 find . -type f -exec sh -c ' for file do size=$(wc -c < "$file") && printf "%s\0" "$size:$file" done' sh {} + | tr '\n\0' '\0\n' | sort -rn | head -n1 | tr '\0' '\n' 

Or with zsh or GNU bash syntax:

 biggest= max=-1 find . -type f -print0 | { while IFS= read -rd '' file; do size=$(wc -l < "$file") && ((size > max)) && max=$size biggest=$file done [[ -n $biggest ]] && printf '%s\n' "$max: $biggest" } 
0
source

Here, which works for me with git bash (mingw32) under windows:

 find . -type f -print0| xargs -0 wc -l 

Files and line counts in the current directory and subdirectories are listed here. You can also direct the output to a text file and import it into Excel if necessary:

 find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt 
0
source

Source: https://habr.com/ru/post/1485743/


All Articles