Python reads incorrectly in text file

I am trying to read in a text file that looks something like this:

Date, StartTime, EndTime 6/8/14, 1832, 1903 6/8/14, 1912, 1918 6/9/14, 1703, 1708 6/9/14, 1713, 1750 

and this is what I have:

 g = open('Observed_closure_info.txt', 'r') closure_date=[] closure_starttime=[] closure_endtime=[] file_data1 = g.readlines() for line in file_data1[1:]: data1=line.split(', ') closure_date.append(str(data1[0])) closure_starttime.append(str(data1[1])) closure_endtime.append(str(data1[2])) 

I did it this way for the previous file, which was very similar to this one, and everything worked fine. However, this file is not read properly. At first it gives me the error "index index out of range" for closure_starttime.append(str(data1[1])) , and when I ask to print what it has for data1 or clos_date, it gives me something like

 ['\x006\x00/\x008\x00/\x001\x004\x00,\x00 \x001\x008\x003\x002\x00,\x00 \x001\x009\x000\x003\x00\r\x00\n'] 

I tried to rewrite the text file in case something was damaged in this particular file and it still does the same. I'm not sure why, because the last time it worked fine.

Any suggestions? Thanks!

+6
source share
2 answers

It looks like a comma-delimited file with UTF-16 encoding (hence \x00 zero bytes). You will need to decode the input from UTF-16, for example:

 import codecs closure_date=[] closure_starttime=[] closure_endtime=[] with codecs.open('Observed_closure_info.txt', 'r', 'utf-16-le') as g: g.next() # skip header line for line in g: date, start, end = line.strip().split(', ') closure_date.append(date) closure_starttime.append(start) closure_endtime.append(end) 
+6
source

try it

 g = open('Observed_closure_info.txt', 'r') closure_date=[] closure_starttime=[] closure_endtime=[] file_data1 = g.readlines() for line in file_data1[1:]: data1=line.decode('utf-16').split(',') closure_date.append(str(data1[0])) closure_starttime.append(str(data1[1])) closure_endtime.append(str(data1[2])) 
+1
source

Source: https://habr.com/ru/post/989689/


All Articles