Return multiple values ​​from pandas applied to DataFrame

I am using a Pandas DataFrame to perform a t-test on a row according to this example:

 import numpy import pandas df = pandas.DataFrame(numpy.log2(numpy.randn(1000, 4), columns=["a", "b", "c", "d"]) df = df.dropna() 

Now, assuming that I have "a" and "b" as one group, and "c" and "d" in the other, I am doing a t-test in a row. This is pretty trivial with pandas, using apply with axis = 1. However, I can either return a DataFrame of the same form if my function is not aggregated, or a series if it is aggregated.

Usually I just output the p value (like aggregation), but I would like to generate an additional value based on other calculations (in other words, return two values). Of course, I can do two runs, first aggregating the p values, and then doing other work, but I was wondering if there is a more efficient way to do this, since the data is large enough.

As an example of a calculation, a hypothetical function would be:

 from scipy.stats import ttest_ind def t_test_and_mean(series, first, second): first_group = series[first] second_group = series[second] _, pvalue = ttest_ind(first_group, second_group) mean_ratio = second_group.mean() / first_group.mean() return (pvalue, mean_ratio) 

Then called with

 df.apply(t_test_and_mean, first=["a", "b"], second=["c", "d"], axis=1) 

Of course, in this case, it returns one series with two tuples as a value.

Instead, ny expected output will be a DataFrame with two columns, one for the first result, the second for the second. Is this possible, or do I need to do two runs for two calculations, and then combine them together?

+42
python pandas
May 25 '12 at 8:35
source share
1 answer

The return of the series, not the tuple, should lead to the creation of a new multi-channel DataFrame. For example,

 return pandas.Series({'pvalue': pvalue, 'mean_ratio': mean_ratio}) 
+57
May 25 '12 at 23:48
source share



All Articles