Pandas force NaN to the bottom of each column at each index

Question

Pandas force NaN to the bottom of each column at each index

I have a DataFrame where several rows span each index. For example, taking the first index has the following structure:

df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN], [np.NaN, np.NaN, 2.0, np.NaN, 2.0], [np.NaN, np.NaN, np.NaN, 3.0, 3.0]], columns=["ID", "Name", "val1", "val2", "val3"], index=[0, 0, 0]) Out[4]: ID Name val1 val2 val3 0 A first 1 1 NaN 0 NaN NaN 2 NaN 2 0 NaN NaN NaN 3 3

I would like to sort / arrange each column so that NaN are at the bottom of each column at the specified index - a result that looks like this:

  ID Name val1 val2 val3 0 A first 1 1 2 0 NaN NaN 2 3 3 0 NaN NaN NaN NaN NaN

A more explicit example might look like this:

 df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN], [np.NaN, np.NaN, 2.0, np.NaN, 2.0], [np.NaN, np.NaN, np.NaN, 3.0, 3.0], ["B", "second", 4.0, 4.0, np.NaN], [np.NaN, np.NaN, 5.0, np.NaN, 5.0], [np.NaN, np.NaN, np.NaN, 6.0, 6.0]], columns=[ "ID", "Name", "val1", "val2", "val3"], index=[0, 0, 0, 1, 1, 1]) Out[5]: ID Name val1 val2 val3 0 A first 1 1 NaN 0 NaN NaN 2 NaN 2 0 NaN NaN NaN 3 3 1 B second 4 4 NaN 1 NaN NaN 5 NaN 5 1 NaN NaN NaN 6 6

with the desired result to look like this:

  ID Name val1 val2 val3 0 A first 1 1 2 0 NaN NaN 2 3 3 0 NaN NaN NaN NaN NaN 1 B second 4 4 5 1 NaN NaN 5 6 6 1 NaN NaN NaN NaN NaN

I have many thousands of rows in this data framework, each index contains up to several hundred rows. My desired result will be very useful when I am to_csv in a data frame.

I tried using sort_values(['val1','val2','val3']) throughout the data frame, but this causes the indices to become unordered. I tried iterating over each index and sorting it, but that doesn't limit the NaN column at the bottom of each index either. I also tried fillna another value, such as 0, but I was not successful either.

While I, of course, am using it incorrectly, the na_position parameter in sort_values does not give the desired result, although it seems like it's probably what you want.

Edit:

The final df index is not required in numerical order, as in my second example.

Changing ignore_index to False in a single line of the third block of @Leb code,

 pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=True)

to

 pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=False)

and by creating temp df for all the rows in the specified index, I was able to make this work - not very pretty, but she orders things as I need them. If someone (of course) has a better way, let me know.

 new_df = df.ix[0] new_df = pd.concat([new_df[col].sort_values().reset_index(drop=True) for col in new_df], axis=1, ignore_index=False) max_index = df.index[-1] for i in range(1, max_index + 1): tmp = df.ix[i] tmp = pd.concat([tmp[col].sort_values().reset_index(drop=True) for col in tmp], axis=1, ignore_index=False) new_df = pd.concat([new_df,tmp]) In [10]: new_df Out[10]: ID Name val1 val2 val3 0 A first 1 1 2 1 NaN NaN 2 3 3 2 NaN NaN NaN NaN NaN 0 B second 4 4 5 1 NaN NaN 5 6 6 2 NaN NaN NaN NaN NaN

+5

python pandas

AGS Nov 04 '15 at 19:41

source share

3 answers

Given df:

 pd.DataFrame([["A","first",1.0,1.0,np.NaN], [np.NaN,np.NaN,2.0,np.NaN,2.0], [np.NaN,np.NaN,np.NaN,3.0,3.0]], columns=[ "ID", "Name", "val1", "val2", "val3"],index=[0,1,2])

I changed the index to make sure that the order remains.

 df Out[127]: ID Name val1 val2 val3 0 A first 1 1 NaN 1 NaN NaN 2 NaN 2 2 NaN NaN NaN 3 3

Using:

 pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=True)

It gives:

 Out[130]: 0 1 2 3 4 0 A first 1 1 2 1 NaN NaN 2 3 3 2 NaN NaN NaN NaN NaN

The same goes for:

 df = pd.DataFrame([["A","first",1.0,1.0,np.NaN], [np.NaN,np.NaN,2.0,np.NaN,2.0], [np.NaN,np.NaN,np.NaN,3.0,3.0], ["B","second",4.0,4.0,np.NaN], [np.NaN,np.NaN,5.0,np.NaN,5.0], [np.NaN,np.NaN,np.NaN,6.0,6.0]], columns=[ "ID", "Name", "val1", "val2", "val3"],index=[0,0,0,1,1,1]) df Out[132]: ID Name val1 val2 val3 0 A first 1 1 NaN 0 NaN NaN 2 NaN 2 0 NaN NaN NaN 3 3 1 B second 4 4 NaN 1 NaN NaN 5 NaN 5 1 NaN NaN NaN 6 6 pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=True) Out[133]: 0 1 2 3 4 0 A first 1 1 2 1 B second 2 3 3 2 NaN NaN 4 4 5 3 NaN NaN 5 6 6 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN

~~After additional comments~~

 new = pd.concat([df[col].sort_values().reset_index(drop=True) for col in df.iloc[:,2:]], axis=1, ignore_index=True) new.index = df.index cols = df.iloc[:,2:].columns new.columns = cols df.drop(cols,inplace=True,axis=1) df = pd.concat([df,new],axis=1) df Out[37]: ID Name val1 val2 val3 0 A first 1 1 2 0 NaN NaN 2 3 3 0 NaN NaN 4 4 5 1 B second 5 6 6 1 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN

+2

Leb Nov 04 '15 at 20:46

source share

 In [219]: df.groupby(level=0).transform(lambda x : x.sort(na_position = 'last' , inplace = False)) Out[219]: ID Name val1 val2 val3 0 A first 1 1 2 0 NaN NaN 2 3 3 0 NaN NaN NaN NaN NaN 1 B second 4 4 5 1 NaN NaN 5 6 6 1 NaN NaN NaN NaN NaN

+1

Nader hisham Nov 05 '15 at 8:12

source share

DSM · Accepted Answer · 2015-11-05T03:06:02+0000

I know that the question of how to push nans to the edge was discussed on github. For your specific frame, I will probably do it manually at the Python level and won't worry much about performance. Sort of

 >>> df.groupby(level=0, sort=False).transform(lambda x: sorted(x,key=pd.isnull)) ID Name val1 val2 val3 0 A first 1 1 2 0 NaN NaN 2 3 3 0 NaN NaN NaN NaN NaN 1 B second 4 4 5 1 NaN NaN 5 6 6 1 NaN NaN NaN NaN NaN

must work. Note that since sorted is stable, and we use pd.isnull as the key (where False <True), we push NaN to the end, preserving the order of the rest of the objects. Also note that here I only group by index; we could alternatively group by all that we wanted.

Pandas force NaN to the bottom of each column at each index

More articles: