Work in Pandas with variable names with a common suffix

Question

Work in Pandas with variable names with a common suffix

I execute most of my data in SAS, but I need to use python for a specific project (I'm not very competent in python). I have a dataframe like this:

values = ['a_us', 'b_us', 'c_us', 'a_ww','b_ww','c_ww'] df = pd.DataFrame(np.random.rand(1, 6), columns=values[:6])

One thing I need to do is calculate the ratio of the US to WW for each of companies a, b, and c. I know how to do this a long way in python - I would do this for every company:

  df['*company*_ratio'] = df['*company*_us']/df['*company*_ww']

But how can I do this without writing out each equation? I think I could do something like

  for x in [a,b,c]:

or I could define a function. However, I don’t know enough to implement any of these parameters, or even search, to find the answer (since I’m sure it was set before). In SAS, I would just write a macro that populates the company.

Thanks.

+5

python pandas

dataryne Mar 22 '16 at 19:46

source share

2 answers

You should use MultiIndex http://pandas.pydata.org/pandas-docs/stable/advanced.html

you should read the section, but your specific case might be:

 df = pandas.DataFrame(np.random.rand(10, 6), columns=pandas.MultiIndex.from_product([['us', 'ww'], ['a', 'b', 'c']])) ratio = df['us']/ df['ww']

the result is a data frame with three columns a, b, c with three requested relationships

+2

Ophir yoktan Mar 22 '16 at 20:11

source share

jezrael · Accepted Answer · 2016-03-22T19:54:59+0000

First you can find unique values from the first char columns with str :

 print df.columns.str[0].unique() ['a' 'b' 'c']

Or the first substring, if the columns are splited _ (better for real data).

 print df.columns.str.split('_').str[0].unique() ['a' 'b' 'c'] for x in df.columns.str[0].unique(): df[x + '_ratio'] = df[x + '_us']/df[x + '_ww']

Comparison

 import pandas as pd import numpy as np np.random.seed(0) values = ['a_us', 'b_us', 'c_us', 'a_ww','b_ww','c_ww'] df = pd.DataFrame(np.random.rand(1, 6), columns=values[:6]) df['a_ratio'] = df['a_us']/df['a_ww'] df['b_ratio'] = df['b_us']/df['b_ww'] df['c_ratio'] = df['c_us']/df['c_ww'] print df a_us b_us c_us a_ww b_ww c_ww a_ratio \ 0 0.548814 0.715189 0.602763 0.544883 0.423655 0.645894 1.007213 b_ratio c_ratio 0 1.688142 0.933223

matches with:

 import pandas as pd import numpy as np np.random.seed(0) values = ['a_us', 'b_us', 'c_us', 'a_ww','b_ww','c_ww'] df = pd.DataFrame(np.random.rand(1, 6), columns=values[:6]) for x in df.columns.str[0].unique(): df[x + '_ratio'] = df[x+'_us']/df[x+'_ww'] print df a_us b_us c_us a_ww b_ww c_ww a_ratio \ 0 0.548814 0.715189 0.602763 0.544883 0.423655 0.645894 1.007213 b_ratio c_ratio 0 1.688142 0.933223

Work in Pandas with variable names with a common suffix

More articles: