Python Pandas removes columns based on maximum column value

Question

Python Pandas removes columns based on maximum column value

I'm just going with Pandas as a tool to iterate over two-dimensional data arrays. This is super overwhelming, even after reading the documents. You can do so much that I cannot figure out how to do it, if that makes sense.

My dataframe (simplified):

Date Stock1 Stock2 Stock3 2014.10.10 74.75 NaN NaN 2014.9.9 NaN 100.95 NaN 2010.8.8 NaN NaN 120.45

Thus, each column has only one value.

I want to delete all columns with a maximum value less than x. So say here, for example, if x = 80, then I want a new DataFrame:

 Date Stock2 Stock3 2014.10.10 NaN NaN 2014.9.9 100.95 NaN 2010.8.8 NaN 120.45

How can this be achieved? I looked at dataframe.max (), which gives me a series. Can I use this or have a lambda function somehow in select ()?

+5

python numpy pandas

professorDante Nov 12 '14 at 10:05

source share

1 answer

Adam hugs · Accepted Answer · 2014-11-12T22:17:32+0000

Use df.max() to index with.

 In [19]: from pandas import DataFrame In [23]: df = DataFrame(np.random.randn(3,3), columns=['a','b','c']) In [36]: df Out[36]: abc 0 -0.928912 0.220573 1.948065 1 -0.310504 0.847638 -0.541496 2 -0.743000 -1.099226 -1.183567 In [24]: df.max() Out[24]: a -0.310504 b 0.847638 c 1.948065 dtype: float64

Next, we derive a logical expression from this:

 In [31]: df.max() > 0 Out[31]: a False b True c True dtype: bool

Then you can index df.columns with this (this is called logical indexing):

 In [34]: df.columns[df.max() > 0] Out[34]: Index([u'b', u'c'], dtype='object')

What you can finally go to DF:

 In [35]: df[df.columns[df.max() > 0]] Out[35]: bc 0 0.220573 1.948065 1 0.847638 -0.541496 2 -1.099226 -1.183567

Of course, instead of 0, you use whatever value you want to use for clipping to delete.

Python Pandas removes columns based on maximum column value

More articles: