Indexing multiple csv files using pandas from records?

I have a list of csv files ( "file1", "file2", ..." ) that have two columns but no header headers. I would like to assign them header headers and them as DataFrame , which are indexed by file, and then indexed by these column labels. For example, I tried:

 import pandas mydict = {} labels = ["col1", "col2"] for myfile in ["file1", "file2"]: my_df = pandas.read_table(myfile, names=labels) # build dictionary of dataframe records mydict[myfile] = my_df test = pandas.DataFrame(mydict) 

this creates a DataFrame, checking if indexes are "myfile1", "myfile2"... however, I would like each of them to be indexed by "col1" and "col2" . My questions:

  • how can I make the first index be a file and the second index are the columns that I assigned (in the labels variable)? So that I can write:

    test["myfile1"]["col1"]

right now, test["myfile1"] gives me only a series of entries.

  • Also, how can I reindex things so that the first indexes are column labels for each file and the second indexes are the file name? So that I can write:

    test["col1"]["myfile1"]

or print test["col1"] , and then see the value of "col1" shown for myfile1, myfile2 , etc.

+4
source share
1 answer

If you use pandas> = 0.7.0 (currently only available on the GitHub repository, although I will inevitably release it!), You can combine your dict DataFrames:

http://pandas.sourceforge.net/merging.html#more-concatenating-with-group-keys

 In [6]: data Out[6]: {'file1.csv': AB 0 1.0914 -1.3538 1 0.5775 -0.2392 2 -0.2157 -0.2253 3 -2.4924 1.0896 4 0.6910 0.8992 5 -1.6196 0.3009 6 -1.5500 0.1360 7 -0.2156 0.4530 8 1.7018 1.1169 9 -1.7378 -0.3373, 'file2.csv': AB 0 -0.4948 -0.15551 1 0.6987 0.85838 2 -1.3949 0.25995 3 1.5314 1.25364 4 1.8582 0.09912 5 -1.1717 -0.21276 6 -0.2603 -1.78605 7 -3.3247 1.26865 8 0.7741 -2.25362 9 -0.6956 1.08774} In [10]: cdf = concat(data, axis=1) In [11]: cdf O ut[11]: file1.csv file2.csv ABAB 0 1.0914 -1.3538 -0.4948 -0.15551 1 0.5775 -0.2392 0.6987 0.85838 2 -0.2157 -0.2253 -1.3949 0.25995 3 -2.4924 1.0896 1.5314 1.25364 4 0.6910 0.8992 1.8582 0.09912 5 -1.6196 0.3009 -1.1717 -0.21276 6 -1.5500 0.1360 -0.2603 -1.78605 7 -0.2156 0.4530 -3.3247 1.26865 8 1.7018 1.1169 0.7741 -2.25362 9 -1.7378 -0.3373 -0.6956 1.08774 

Then, if you want to switch the order of the column indices, you can do:

 In [14]: cdf.swaplevel(0, 1, axis=1) Out[14]: ABAB file1.csv file1.csv file2.csv file2.csv 0 1.0914 -1.3538 -0.4948 -0.15551 1 0.5775 -0.2392 0.6987 0.85838 2 -0.2157 -0.2253 -1.3949 0.25995 3 -2.4924 1.0896 1.5314 1.25364 4 0.6910 0.8992 1.8582 0.09912 5 -1.6196 0.3009 -1.1717 -0.21276 6 -1.5500 0.1360 -0.2603 -1.78605 7 -0.2156 0.4530 -3.3247 1.26865 8 1.7018 1.1169 0.7741 -2.25362 9 -1.7378 -0.3373 -0.6956 1.08774 

Alternatively, and perhaps a little bluntly, you can use the panel:

 In [16]: p = Panel(data) In [17]: p Out[17]: <class 'pandas.core.panel.Panel'> Dimensions: 2 (items) x 10 (major) x 2 (minor) Items: file1.csv to file2.csv Major axis: 0 to 9 Minor axis: A to B In [18]: p = p.swapaxes(0, 2) In [19]: p Out[19]: <class 'pandas.core.panel.Panel'> Dimensions: 2 (items) x 10 (major) x 2 (minor) Items: A to B Major axis: 0 to 9 Minor axis: file1.csv to file2.csv In [20]: p['A'] Out[20]: file1.csv file2.csv 0 1.0914 -0.4948 1 0.5775 0.6987 2 -0.2157 -1.3949 3 -2.4924 1.5314 4 0.6910 1.8582 5 -1.6196 -1.1717 6 -1.5500 -0.2603 7 -0.2156 -3.3247 8 1.7018 0.7741 9 -1.7378 -0.6956 
+6
source

Source: https://habr.com/ru/post/1391687/


All Articles