Parsing a text file into a list in python

Question

Parsing a text file into a list in python

I am completely new to Python and I am trying to read in a txt file that contains a combination of words and numbers. I can read in a txt file just fine, but I'm struggling to get a string in a format that I can work with.

import matplotlib.pyplot as plt import numpy as np from numpy import loadtxt f= open("/Users/Jennifer/Desktop/test.txt", "r") lines=f.readlines() Data = [] list=lines[3] i=4 while i<12: list=list.append(line[i]) i=i+1 print list f.close()

I need a list containing all the elements in lines 3-12 (starting at 0), which is all numbers. When I print lines [1], I get data from this line. When I print lines or print lines [3:12], I get every character preceded by \ x00. For example, the word "Plate" becomes: ['\ x00P \ x00l \ x00a \ x00t \ x00e. Using lines = [line.strip () for a line in f] gets the same result. When I try to combine the individual lines in the while loop above, I get the error message: "AttributeError: object" str "does not have the attribute" add ".

How can I get the selection of lines from a txt file to a list? Thank you very much.

Edit: txt file is as follows:

BLOCKS = 1 Plate: Phosphate noise analysis 2000x 1.3 PlateFormat Endpoint Absorbance Raw FALSE 1 1 650 1 12 96 1 8
Temperature (¡C) 1 2 3 4 5 6 7 8 9 10 11 12
21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227
0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197
0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451
0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235
0.474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191
0.4721 0.4794 0.501 0.467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273
0.4122 0.4454 0.314 0.277 0.4621 0.416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038
0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.191 0.1062

~ End Original Filename: 2013-08-06 Phosphate Noisiness; Last Savings Date: 8/6/2013 7:00:55 PM

Update I used this code:

 f= open("/Users/Jennifer/Desktop/test.txt", "r") file_list = f.readlines() first_twelve = file_list[3:11] data = [x.replace('\t',' ') for x in first_twelve] data = [x.replace('\x00','') for x in data] data = [x.replace(' \r\n','') for x in data] print data

to get this result: ['21,4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227 ',' 0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197 ',' 0.4882 0.4913 0.4941 0.5188 0, 4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451 ',' 0.4771 0.4475 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0 , 5027 0.5235 ',' 0.4474 0.4271 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.499 0.5269 0.5091 0.5191 ',' 0.4721 0, 4794 0.501 0.467 0.4785 0.492 0.4894 0.511 0.4778 0.5223 0.4888 0.5273 ',' 0.4122 0.4454 0.314 0.2747 0.4621 0.416 0.3716 0.2534 0.4497 0 , 5778 0.2319 0.1038 ',' 0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062 ']

What (correct me if I'm wrong, very new to Python!) Is a list of lists that I have to work with. Thank you very much to everyone who answered !!!

+4

python string list lines

Rachel rose Aug 19 '13 at 0:01

source share

3 answers

Peter Foti · Answer 1 · 2013-08-19T00:18:15+0000

When you write the lines = f.readlines() code, the list of lines is returned to you. When you say lines[3] , you get the 3rd line. This is why you end up with individual characters.

All you have to do is say

 files = open("Your File.txt") file_list = files.readlines() first_twelve = file_list[0:12] #returns a list with the first 12 lines

After you get the first_twelve array, you can do whatever you want with it.

To print each line you would do:

 for each_line in first_twelve: print each_line

This should work for you.

dawg · Answer 2 · 2013-08-19T00:06:45+0000

You have the line list=lines[3] in the source code.

Two questions here.

Do not use list as a variable name. You silently overwrite the built-in list designer when you did this.
When you take one element from the list of lines[3] , now you only have this object - in this case a string. When you try to add to it, you cannot - this is not a list.

You can easily demonstrate your error in the console:

 >>> li=['1'] >>> li.append('2') >>> li ['1', '2'] >>> st='1' >>> st.append('2') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'str' object has no attribute 'append'

Other comments, in general, about your code.

Suppose you have a text file named '/ tmp / test / txt' that contains this text:

 Line 1 Line 2 ... Line 19

Reading the contents of this file is simple:

 with open('/tmp/test.txt', 'r') as fin: lines=fin.readlines()

If you want a subset of strings, you can use slice:

 subset=lines[3:12]

If you want to process each line for something, for example, split the carriage return, use the file file as an iterator:

 with open('/tmp/test.txt', 'r') as fin: lines=[] for line in fin: lines.append(line.strip())

For your specific problem with the presence of NUL in the data, maybe you are reading a binary file disguised as text? You need to send an example file.

Edit

Your file contains Unicode characters. (right after “Temperature”), which may be some of the odd characters that you see. If you are only interested in strings with numbers, you can ignore them.

You do not have a list of lists, but it is easy to get:

 data=[] # will hold the lines of the file with open(ur_file,'rU') as fin: for line in fin: # for each line of the file line=line.strip() # remove CR/LF if line: # skip blank lines data.append(line) print data # list of STRINGS separated by spaces matrix=[map(float,line.split()) for line in data[3:10]] # convert the strings.. print matrix # NOW you have a list of list of floats...

chapter3 · Answer 3 · 2013-08-19T00:13:52+0000

The setting below will help you get rid of the \ 00 character embedded in your data.

 f = open("/Users/Jennifer/Desktop/test.text", "r") lines = f.readlines() lines = [x.replace('\x00','') for x in lines] for i in range(3,12): l = [] l.append(lines[i])

I'm not sure if your data has other delimiters (like comma or space) for separating numbers. If so, a simple split will help convert the string to a list:

 line = '123.00,456.00,789.00' l = line.split(',') # list will become ['123.00','456.00','789.00']

Edit

Continue updating the updated Rachel code:

 f= open("/Users/Jennifer/Desktop/test.txt", "r") file_list = f.readlines() first_twelve = file_list[3:11] data = [x.replace('\t',' ') for x in first_twelve] data = [x.replace('\x00','') for x in data] data = [x.replace(' \r\n','') for x in data] items = [] for dataline in data: items += dataline.split(' ') items = [float(x) for x in items if len(x) > 0] # remove dummy items left in the list print items

Parsing a text file into a list in python

More articles: