Isolation Adjacent columns based on str.contains

Hi everyone, so my dataframe looks like this:

 A |  B   |   C | D | E
    'USD'
   'trading expenses-total'   
      8.10   2.3   5.5
      9.1    1.4   6.1
      5.4    5.1   7.8

I did not find anything like this, so I apologize if this is a duplicate. But essentially, I'm trying to find a column containing the string "total" (column B) and their adjacent columns (C and D) and turn them into a data frame. I feel like I'm close with the following code:

test.loc[:,test.columns.str.contains('total')]

which isolates the correct column, but I cannot figure out how to capture the adjacent two columns. My desired result:

 B   |                      C  |  D 
'USD'
'trading expenses-total'   
 8.10                       2.3   5.5
 9.1                        1.4   6.1
 5.4                        5.1   7.8
+4
source share
2 answers

OLD answer:

Pandas:

In [36]: df = pd.DataFrame(np.random.rand(3,5), columns=['A','total','C','D','E'])

In [37]: df
Out[37]:
          A     total         C         D         E
0  0.789482  0.427260  0.169065  0.112993  0.142648
1  0.303391  0.484157  0.454579  0.410785  0.827571
2  0.984273  0.001532  0.676777  0.026324  0.094534

In [38]: idx = np.argmax(df.columns.str.contains('total'))

In [39]: df.iloc[:, idx:idx+3]
Out[39]:
      total         C         D
0  0.427260  0.169065  0.112993
1  0.484157  0.454579  0.410785
2  0.001532  0.676777  0.026324

UPDATE:

In [118]: df
Out[118]:
     A                       B    C    D     E
0  NaN                     USD  NaN  NaN   NaN
1  NaN  trading expenses-total  NaN  NaN   NaN
2    A                    8.10  2.3  5.5  10.0
3    B                     9.1  1.4  6.1  11.0
4    C                     5.4  5.1  7.8  12.0

In [119]: col = df.select_dtypes(['object']).apply(lambda x: x.str.contains('total').any()).idxmax()

In [120]: cols = df.columns.to_series().loc[col:].head(3).tolist()

In [121]: col
Out[121]: 'B'

In [122]: cols
Out[122]: ['B', 'C', 'D']

In [123]: df[cols]
Out[123]:
                        B    C    D
0                     USD  NaN  NaN
1  trading expenses-total  NaN  NaN
2                    8.10  2.3  5.5
3                     9.1  1.4  6.1
4                     5.4  5.1  7.8
+3
source

Here's one approach -

from scipy.ndimage.morphology import binary_dilation as bind

mask = test.columns.str.contains('total')
test_out = test.iloc[:,bind(mask,[1,1,1],origin=-1)]

SciPy, np.convolve, :

test_out = test.iloc[:,np.convolve(mask,[1,1,1])[:-2]>0]

№1:

In [390]: np.random.seed(1234)

In [391]: test = pd.DataFrame(np.random.randint(0,9,(3,5)))

In [392]: test.columns = [['P','total001','g','r','t']]

In [393]: test
Out[393]: 
   P  total001  g  r  t
0  3         6  5  4  8
1  1         7  6  8  0
2  5         0  6  2  0

In [394]: mask = test.columns.str.contains('total')

In [395]: test.iloc[:,bind(mask,[1,1,1],origin=-1)]
Out[395]: 
   total001  g  r
0         6  5  4
1         7  6  8
2         0  6  2

№ 2:

, , -

In [401]: np.random.seed(1234)

In [402]: test = pd.DataFrame(np.random.randint(0,9,(3,7)))

In [403]: test.columns = [['P','total001','g','r','t','total002','k']]

In [406]: test
Out[406]: 
   P  total001  g  r  t  total002  k
0  3         6  5  4  8         1  7
1  6         8  0  5  0         6  2
2  0         5  2  6  3         7  0

In [407]: mask = test.columns.str.contains('total')

In [408]: test.iloc[:,bind(mask,[1,1,1],origin=-1)]
Out[408]: 
   total001  g  r  total002  k
0         6  5  4         1  7
1         8  0  5         6  2
2         5  2  6         7  0
+3

Source: https://habr.com/ru/post/1682168/


All Articles