Running "wc -l <filename>" in Python code

Question

Running "wc -l <filename>" in Python code

I want to do a 10x crosscheck for huge files (every hundred thousand lines). I want to do "wc -l" every time I start reading a file, and then generate arbitrary numbers a fixed number of times, each time writing this line number to a separate file. I use this:

 import os for i in files: os.system("wc -l <insert filename>").

How to insert a file name here. Its a variable. I looked through the documentation, but basically lists the ls , which does not have this problem.

+6

python

crazyaboutliv Jun 29 '11 at 12:40

source share

7 answers

Let's compare:

 from subprocess import check_output def wc(filename): return int(check_output(["wc", "-l", filename]).split()[0]) def native(filename): c = 0 with open(filename) as file: while True: chunk = file.read(10 ** 7) if chunk == "": return c c += chunk.count("\n") def iterate(filename): with open(filename) as file: for i, line in enumerate(file): pass return i + 1

Function go go timeit!

 from timeit import timeit from sys import argv filename = argv[1] def testwc(): wc(filename) def testnative(): native(filename) def testiterate(): iterate(filename) print "wc", timeit(testwc, number=10) print "native", timeit(testnative, number=10) print "iterate", timeit(testiterate, number=10)

Result:

 wc 1.25185894966 native 2.47028398514 iterate 2.40715694427

So wc is about twice as fast by 150 MB of compressed files with ~ 500,000 lines, this is what I tested. However , testing the file generated with seq 3000000 >bigfile , I get the following numbers:

 wc 0.425990104675 native 0.400163888931 iterate 3.10369205475

Hey look, FTW python! However, using longer strings (~ 70 characters):

 wc 1.60881590843 native 3.24313092232 iterate 4.92839002609

So, the conclusion: it depends, but wc seems to be the best option for everyone.

+8

Lauritz V. Thaulow Jun 29 '11 at 13:40

source share

No need to use wc -l Use the following python function

 def file_len(fname): with open(fname) as f: for i, l in enumerate(f, 1): pass return i

This is probably more efficient than calling an external utility (this input loop is the same way).

Update

Wrong, wc -l much faster!

 seq 10000000 > huge_file $ time wc -l huge_file 10000000 huge_file real 0m0.267s user 0m0.110s sys 0m0.010s $ time ./p.py 10000000 real 0m1.583s user 0m1.040s sys 0m0.060s

+5

Fredrik pihl Jun 29 '11 at 12:43

source share

os.system gets the string. Just build the line explicitly:

 import os for i in files: os.system("wc -l " + i)

+3

Nathan fellman Jun 29 '11 at 12:43

source share

Here is the Python approach I found to solve this problem:

 count_of_lines_in_any_textFile = sum(1 for l in open('any_textFile.txt'))

+3

user6316035 May 10 '16 at 15:14

source share

My solution is very similar to the "native" lazyr function:

 import functools def file_len2(fname): with open(fname, 'rb') as f: lines= 0 reader= functools.partial(f.read, 131072) for datum in iter(reader, ''): lines+= datum.count('\n') last_wasnt_nl= datum[-1] != '\n' return lines + last_wasnt_nl

This, unlike wc , treats the final line not ending with '\ n' as a separate line. If you need the same functionality as wc, then it can be (completely unwritten :) written as:

 import functools as ft, itertools as it, operator as op def file_len3(fname): with open(fname, 'rb') as f: reader= ft.partial(f.read, 131072) counter= op.methodcaller('count', '\n') return sum(it.imap(counter, iter(reader, '')))

with comparable time to wc in all the test files I created.

Note: this applies to Windows and POSIX machines. Old MacOS used "\ r" as end-of-line characters.

0

tzot Jun 29 '11 at 20:00

source share

I found a much simpler way:

 import os linux_shell='more /etc/hosts|wc -l' linux_shell_result=os.popen(linux_shell).read() print(linux_shell_result)

0

james.peng Jun 30 '17 at 3:12

source share

Thiefmaster · Accepted Answer · 2011-06-29T12:46:27+0000

 import subprocess for f in files: subprocess.call(['wc', '-l', f])

Also see http://docs.python.org/library/subprocess.html#convenience-functions - for example, if you want to access the output in a string, you will want to use subprocess.check_output() instead of subprocess.call()

Running "wc -l <filename>" in Python code

More articles: