"The string contains NULL bytes" in a CSV reader (Python)

Question

"The string contains NULL bytes" in a CSV reader (Python)

I am trying to write a program that looks at a .CSV file (input.csv) and overwrites only lines that start with a specific element (corrected.csv), as indicated in a text file (output.txt).

Here's what my program looks like right now:

import csv lines = [] with open('output.txt','r') as f: for line in f.readlines(): lines.append(line[:-1]) with open('corrected.csv','w') as correct: writer = csv.writer(correct, dialect = 'excel') with open('input.csv', 'r') as mycsv: reader = csv.reader(mycsv) for row in reader: if row[0] not in lines: writer.writerow(row)

Unfortunately, I keep getting this error, and I don't know what that means.

 Traceback (most recent call last): File "C:\Python32\Sample Program\csvParser.py", line 12, in <module> for row in reader: _csv.Error: line contains NULL byte

We thank all the people here to even bring me to this point.

+75

python csv

James Roseman Oct. 25 '11 at 19:39

source share

9 answers

I assume you have a NUL byte in input.csv. You can check it with

 if '\0' in open('input.csv').read(): print "you have null bytes in your input file" else: print "you don't"

if you do that

 reader = csv.reader(x.replace('\0', '') for x in mycsv)

can help you with that. Or it may mean that you have utf16 or something interesting in the .csv file.

+61

retracile 25 Oct '11 at 19:58

source share

You can simply embed the generator to filter out the null values if you want to pretend that they don't exist. Of course, this assumes that null bytes are not really part of the encoding and are indeed a kind of erroneous artifact or error.

See below (line.replace('\0','') for line in f) , you will also probably want to open this file using rb mode.

 import csv lines = [] with open('output.txt','r') as f: for line in f.readlines(): lines.append(line[:-1]) with open('corrected.csv','w') as correct: writer = csv.writer(correct, dialect = 'excel') with open('input.csv', 'rb') as mycsv: reader = csv.reader( (line.replace('\0','') for line in mycsv) ) for row in reader: if row[0] not in lines: writer.writerow(row)

+8

woot Nov 25 '14 at 7:53

source share

This will tell you what the problem is.

 import csv lines = [] with open('output.txt','r') as f: for line in f.readlines(): lines.append(line[:-1]) with open('corrected.csv','w') as correct: writer = csv.writer(correct, dialect = 'excel') with open('input.csv', 'r') as mycsv: reader = csv.reader(mycsv) try: for i, row in enumerate(reader): if row[0] not in lines: writer.writerow(row) except csv.Error: print('csv choked on line %s' % (i+1)) raise

Perhaps this from daniweb will be useful:

I get this error when reading from a CSV file: "Runtime error! The string contains NULL bytes." Any idea on the root cause of this error?

...

Ok, I got this and thought I would lay out a solution. It just caused me grief ... The used file was saved in .xls format instead of .csv. It did not intercept it, because the file name itself had the extension .csv, while the type was still .xls

+7

Steven Rumbalski Oct 25 '11 at 20:00

source share

If you want to replace empty values with something, you can do this:

 def fix_nulls(s): for line in s: yield line.replace('\0', ' ') r = csv.reader(fix_nulls(open(...)))

+6

Claudiu May 08 '18 at 2:04

source share

The hard way:

If you work in Lunux, you can use all the features of sed :

 from subprocess import check_call, CalledProcessError PATH_TO_FILE = '/home/user/some/path/to/file.csv' try: check_call("sed -i -e 's|\\x0||g' {}".format(PATH_TO_FILE), shell=True) except CalledProcessError as err: print(err)

The most effective solution for huge files.

Checked for Python3, Kubuntu

+2

SergO Aug 11 '17 at 11:10 on

source share

I recently fixed this problem and in my instance it was a compressed file that I was trying to read. First check the file format. Then check if this content matches.

+1

Daniel Lee Jul 29 '16 at 11:56

source share

Turning my Linux environment into a clean, full UTF-8 environment did the trick for me. At the command prompt, do the following:

 export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 export LANGUAGE=en_US.UTF-8

+1

Philippe Oger Mar 09 '17 at 12:23

source share

pandas.read_csv now handles various UTF encodings when reading / writing, and therefore can deal directly with null bytes

 data = pd.read_csv(file, encoding='utf-16')

see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html.

0

Sébastien Wieckowski May 14 '19 at 19:11

source share

K. David C. · Accepted Answer · 2012-03-27 01:01

I solved the problem with a simpler solution:

 import codecs csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))

The key used the codec module to open the file with the UTF-16 encoding, there are much more encodings, check the documentation .

"The string contains NULL bytes" in a CSV reader (Python)

More articles: