Selecting the last n columns and excluding the last n columns in the dataframe

Like me:

  • Select the last 3 columns in the data frame and create a new data frame?

I tried:

y = dataframe.iloc[:,-3:] 
  1. Exclude last 3 columns and create a new dataframe?

I tried:

 X = dataframe.iloc[:,:-3] 

Is it correct?

I am getting array dimension errors in my code and want to make sure this step is correct.

thanks

+5
source share
2 answers

just do:

 y = dataframe[dataframe.columns[-3:]] 

This cuts the columns so you can choose from df

Example:

 In [221]: df = pd.DataFrame(columns=np.arange(10)) df[df.columns[-3:]] Out[221]: Empty DataFrame Columns: [7, 8, 9] Index: [] 

I think the problem here is that since you took the df snippet, it returned the view, but depending on what the rest of your code does, it raises a warning. You can make an explicit copy by calling .copy() to remove the warnings.

So, if we take a copy, the assignment affects only the copy, not the original df:

 In [15]: df = pd.DataFrame(np.random.randn(5,10), columns= np.arange(10)) df Out[15]: 0 1 2 3 4 5 6 \ 0 0.568284 -1.488447 0.970365 -1.406463 -0.413750 -0.934892 -1.421308 1 1.186414 -0.417366 -1.007509 -1.620530 -1.322004 0.294540 1.205115 2 -1.073894 -0.214972 1.516563 -0.705571 0.068666 1.690654 -0.252485 3 0.923524 -0.856752 0.226294 -0.660085 1.259145 0.400596 0.559028 4 0.259807 0.135300 1.130347 -0.317305 -1.031875 0.232262 0.709244 7 8 9 0 1.741925 -0.475619 -0.525770 1 2.137546 0.215665 1.908362 2 1.180281 -0.144652 0.870887 3 -0.609804 -0.833186 -1.033656 4 0.480943 1.971933 1.928037 In [16]: y = df[df.columns[-3:]].copy() y Out[16]: 7 8 9 0 1.741925 -0.475619 -0.525770 1 2.137546 0.215665 1.908362 2 1.180281 -0.144652 0.870887 3 -0.609804 -0.833186 -1.033656 4 0.480943 1.971933 1.928037 In [17]: y[y>0] = 0 print(y) df 7 8 9 0 0.000000 -0.475619 -0.525770 1 0.000000 0.000000 0.000000 2 0.000000 -0.144652 0.000000 3 -0.609804 -0.833186 -1.033656 4 0.000000 0.000000 0.000000 Out[17]: 0 1 2 3 4 5 6 \ 0 0.568284 -1.488447 0.970365 -1.406463 -0.413750 -0.934892 -1.421308 1 1.186414 -0.417366 -1.007509 -1.620530 -1.322004 0.294540 1.205115 2 -1.073894 -0.214972 1.516563 -0.705571 0.068666 1.690654 -0.252485 3 0.923524 -0.856752 0.226294 -0.660085 1.259145 0.400596 0.559028 4 0.259807 0.135300 1.130347 -0.317305 -1.031875 0.232262 0.709244 7 8 9 0 1.741925 -0.475619 -0.525770 1 2.137546 0.215665 1.908362 2 1.180281 -0.144652 0.870887 3 -0.609804 -0.833186 -1.033656 4 0.480943 1.971933 1.928037 

A warning is not displayed here and the source df file is not touched.

+5
source

This is due to the use of integer indices (ix selects them by the method over -3 rather than by position, and this is by design: see the whole indexing in pandas "gotchas" *).

* In newer versions of pandas, we prefer loc or iloc to remove the ix ambiguity as a position or label:

df.iloc [-3:] see documents.

As Wes points out, in this particular case, you just need to use the tail!

It should also be noted that in pandas pre-0.14 iloc will raise an IndexError to access outside the borders, while .head () and .tail () will not:

paid version '0.12.0' df = pd.DataFrame ([{"" ": 1}, {" a ": 2}]) df.iloc [-5:] ... IndexError: out-of-bounds on slice (end) df.tail (5) 0 1 1 2 Old answer (discounted method):

You can use the irows DataFrame method to overcome this ambiguity:

In [11]: df1.irow (slice (-3, None)) From [11]: STK_ID RPT_Date TClose sale discount 8 568 20080331 38.75 12.668 NaN 9 568 20080630 30.09 21.102 NaN 10 568 20080930 26.00 30.769 NaN Note. Series have a similar iget method.

0
source

Source: https://habr.com/ru/post/1233369/


All Articles