Pandas selection of discontinuous columns from a data frame

I use the following to select specific columns from the dataframe comb that I would like to add to the new framework. A person chooses to work normally EG: comb.ix [:, 0: 1], but when I try to combine them using +, I get a bad result, the 1st choice ([:, 0: 1]) gets stuck at the end of the data frame and the values ​​contained in the original column 1 are destroyed when they appear at the end of the row. What is the correct way to get only the columns that I want? (I would include sample data, but, as you can see, there are too many columns ... that's why I'm trying to do it this way)

comb.ix[:,0:1]+comb.ix[:,17:342] 
+9
source share
3 answers

If you want to combine a subset of your df columns, use pd.concat :

 pd.concat([comb.ix[:,0:1],comb.ix[:,17:342]], axis=1) 

As long as the indices match, it will align correctly.

Thanks to @iHightower, you can also choose by passing shortcuts:

 pd.concat([df.ix[:,'Col1':'Col5'],df.ix[:,'Col9':'Col15']],aβ€Œβ€‹xis=1) 

Please note that .ix become obsolete in the next version, the following should work:

 In [115]: df = pd.DataFrame(columns=['col' + str(x) for x in range(10)]) df Out[115]: Empty DataFrame Columns: [col0, col1, col2, col3, col4, col5, col6, col7, col8, col9] Index: [] In [118]: pd.concat([df.loc[:, 'col2':'col4'], df.loc[:, 'col7':'col8']], axis=1)​ Out[118]: Empty DataFrame Columns: [col2, col3, col4, col7, col8] Index: [] 

Or using iloc :

 In [127]: pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')]], axis=1) Out[127]: Empty DataFrame Columns: [col2, col3, col7] Index: [] 

Please note that the iloc slicing iloc open / closed, so the final range is not included, so you will need to find the column after the column of interest if you want to include it:

 In [128]: pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')+1], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')+1]], axis=1) Out[128]: Empty DataFrame Columns: [col2, col3, col4, col7, col8] Index: [] 
+12
source

NumPy has a great module called r_ , so you can solve it using the modern iloc data frame selection interface:

 df.iloc[:, np.r_[0:1, 17:342] 

I think this is a more elegant solution.

+2
source

I recently solved this by simply adding ranges

 r1 = pd.Series(range(5)) r2 = pd.Series([10,15,20]) final_range = r1.append(r2) df.iloc[:,final_range] 

Then you will get columns with 0: 5 and 10, 15, 20.

+1
source

Source: https://habr.com/ru/post/984160/


All Articles