Import data and variable names from a text file in Python

I have a text file containing simulation data (60 columns, 100 thousand lines):

abc 1 11 111 2 22 222 3 33 333 4 44 444 

... where in the first row are the names of the variables, and below (in columns) is the corresponding data (type float).

I need to use all these variables with my data in Python for further calculations. For example, when I insert:

 print(b) 

I need to get the values ​​from the second column.

I know how to import data:

 data=np.genfromtxt("1.txt", unpack=True, skiprows = 1) 

Assign the variables manually:

 a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1) 

But I'm having trouble getting variable names:

 reader = csv.reader(open("1.txt", "rt")) for row in reader: list.append(row) variables=(list[0]) 

How can I change this code to get all variable names from the first line and assign them to imported arrays?

+4
source share
4 answers

Instead of trying to assign names, you might consider using an associative array, which is known in Python as a dict , to preserve your variables and their values. Then the code might look something like this (borrowing liberally from csv docs ):

 import csv with open('1.txt', 'rt') as f: reader = csv.reader(f, delimiter=' ', skipinitialspace=True) lineData = list() cols = next(reader) print(cols) for col in cols: # Create a list in lineData for each column of data. lineData.append(list()) for line in reader: for i in xrange(0, len(lineData)): # Copy the data from the line into the correct columns. lineData[i].append(line[i]) data = dict() for i in xrange(0, len(cols)): # Create each key in the dict with the data in its column. data[cols[i]] = lineData[i] print(data) 

data then contains each of your variables, which can be accessed via data['varname'] .

So, for example, you could do data['a'] to get the list ['1', '2', '3', '4'] based on the input provided in your question.

I think trying to create names based on the data in your document can be a rather inconvenient way to do this compared to the dict based method shown above. If you really want to do this, you can take a look at reflection in Python (a topic I don't know anything about).

+1
source

Answer: you do not want to do this .

Dictionaries are designed specifically for this purpose: the data structure that you really want will look something like this:

 data = { "a": [1, 2, 3, 4], "b": [11, 22, 33, 44], "c": [111, 222, 333, 444], } 

... which can then be easily obtained using, for example, data["a"] .

It is possible to do what you want, but the usual way is to hack, which relies on Python to use (drrollroll) a dict internally to store variables - and since your code won I don’t know the names of these variables, you will to focus on using dictionary access to get from them as well ... so you could just use the dictionary first.

It is worth noting that this was intentional in Python, because if your code does not know the names of your variables, they are, by definition, data, not logical, and should be treated as such.

If you are not sure yet, here is a good article on this topic:

Stupid Python Ideas: Why You Don't Want to Create Variables Dynamically

+2
source

Thanks to @ andyg0808 and @Zero Piraeus, I found another solution. For me, the most appropriate is to use the Pandas data analysis library.

  import pandas as pd data=pd.read_csv("1.txt", delim_whitespace=True, skipinitialspace=True) result=data["a"]*data["b"]*3 print(result) 0 33 1 132 2 297 3 528 

... where 0,1,2,3 is the row index.

0
source

Here is an easy way to convert a .txt file of variable and data names into NumPy arrays.

 D = np.genfromtxt('1.txt',dtype='str') # load the data in as strings D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats D_names = D[0,:] # save a list of the variable names for i in range(len(D_names)): key = D_names[i] # define the key for this variable val = D_data[:,i] # set the value for this variable exec(key + '=val') # build the variable code here 

I like this method because it is easy to track and easy to maintain. We can write this code as follows:

 D = np.genfromtxt('1.txt',dtype='str') # load the data in as strings for i in range(D.shape[1]): val = np.asarray(D[1::,i],dtype=float) # set the value for this variable exec(D[0,i] + '=val') # build the variable 

Both codes do the same, return NumPy arrays with names a, b and c with the corresponding data.

0
source

Source: https://habr.com/ru/post/1496264/


All Articles