How to read in a CSV file that contains pound characters?

My file has a NUL byte at the beginning and I'm struggling with the character "£"

data_initial = codecs.open(filename, "rU", "utf-16")
data = csv.DictReader((line.replace('\x00','') for line in data_initial), delimiter="\t")
    for row in data:
        print row

I get an error message:

UnicodeEncodeError: codec 'ascii' cannot encode character u '\ xa3' at position 169: serial number not in range (128)

By the way: it doesn't matter if I try to print this line or not. I can only print "1" and the error remains the same. I do not know why this indicates a coding error, when it is probably a decoding error.

Anyway, how can I solve the problem?

+4
source share
2 answers

, Python 2.7 . Python 2.7 CSV python. , csv. , csv ; , .

Python 3.x, csv Unicode, , csv . .

, .

, Python (, "ascii" , ). , , , , , .

if sys.version_info < (3, 0):
    # Python2: csv module does not support unicode, we must use byte strings.   

    def _input_csv(csv_data):
        for line in csv_data:
            assert isinstance(line, bytes)
            yield line

    def _output_csv(csv_line):
        for i, column in enumerate(csv_line):
            csv_line[i] = column.decode("ascii", errors='ignore')
            assert isinstance(csv_line[i], unicode)  # NOQA

else:
    # Python3: csv module does support unicode, we must use strings everywhere, 
    # not byte strings

    def _input_csv(unicode_csv_data):
        for line in unicode_csv_data:
            assert isinstance(line, bytes)
            line = line.decode("ascii", errors='ignore')
            assert isinstance(line, str)
            yield line

    def _output_csv(csv_line):
        for column in csv_line:
            assert isinstance(column, str)

( ):

reader = csv.reader(_input_csv(process.stdout), delimiter="|")
for row in reader:
    _output_csv(row)
+1

, codecs.open(filename, "rU", "utf-16") "£" , csv:

csv Unicode. , , NULL ASCII. , UTF-8 ASCII , ; . .

"utf-8" ( , ) : codecs.open(filename, "rU", "utf-8")

+2

Source: https://habr.com/ru/post/1623686/


All Articles