I have a DataFrame where several rows span each index. For example, taking the first index has the following structure:
df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN], [np.NaN, np.NaN, 2.0, np.NaN, 2.0], [np.NaN, np.NaN, np.NaN, 3.0, 3.0]], columns=["ID", "Name", "val1", "val2", "val3"], index=[0, 0, 0]) Out[4]: ID Name val1 val2 val3 0 A first 1 1 NaN 0 NaN NaN 2 NaN 2 0 NaN NaN NaN 3 3
I would like to sort / arrange each column so that NaN are at the bottom of each column at the specified index - a result that looks like this:
ID Name val1 val2 val3 0 A first 1 1 2 0 NaN NaN 2 3 3 0 NaN NaN NaN NaN NaN
A more explicit example might look like this:
df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN], [np.NaN, np.NaN, 2.0, np.NaN, 2.0], [np.NaN, np.NaN, np.NaN, 3.0, 3.0], ["B", "second", 4.0, 4.0, np.NaN], [np.NaN, np.NaN, 5.0, np.NaN, 5.0], [np.NaN, np.NaN, np.NaN, 6.0, 6.0]], columns=[ "ID", "Name", "val1", "val2", "val3"], index=[0, 0, 0, 1, 1, 1]) Out[5]: ID Name val1 val2 val3 0 A first 1 1 NaN 0 NaN NaN 2 NaN 2 0 NaN NaN NaN 3 3 1 B second 4 4 NaN 1 NaN NaN 5 NaN 5 1 NaN NaN NaN 6 6
with the desired result to look like this:
ID Name val1 val2 val3 0 A first 1 1 2 0 NaN NaN 2 3 3 0 NaN NaN NaN NaN NaN 1 B second 4 4 5 1 NaN NaN 5 6 6 1 NaN NaN NaN NaN NaN
I have many thousands of rows in this data framework, each index contains up to several hundred rows. My desired result will be very useful when I am to_csv in a data frame.
I tried using sort_values(['val1','val2','val3']) throughout the data frame, but this causes the indices to become unordered. I tried iterating over each index and sorting it, but that doesn't limit the NaN column at the bottom of each index either. I also tried fillna another value, such as 0, but I was not successful either.
While I, of course, am using it incorrectly, the na_position parameter in sort_values does not give the desired result, although it seems like it's probably what you want.
Edit:
The final df index is not required in numerical order, as in my second example.
Changing ignore_index to False in a single line of the third block of @Leb code,
pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=True)
to
pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=False)
and by creating temp df for all the rows in the specified index, I was able to make this work - not very pretty, but she orders things as I need them. If someone (of course) has a better way, let me know.
new_df = df.ix[0] new_df = pd.concat([new_df[col].sort_values().reset_index(drop=True) for col in new_df], axis=1, ignore_index=False) max_index = df.index[-1] for i in range(1, max_index + 1): tmp = df.ix[i] tmp = pd.concat([tmp[col].sort_values().reset_index(drop=True) for col in tmp], axis=1, ignore_index=False) new_df = pd.concat([new_df,tmp]) In [10]: new_df Out[10]: ID Name val1 val2 val3 0 A first 1 1 2 1 NaN NaN 2 3 3 2 NaN NaN NaN NaN NaN 0 B second 4 4 5 1 NaN NaN 5 6 6 2 NaN NaN NaN NaN NaN