Pandas Colon Concatenation

I am trying to combine several columns that mainly contain NaN, but here is an example of only 2:

2013-06-18 21:46:33.422096-05:00 A NaN 2013-06-18 21:46:35.715770-05:00 A NaN 2013-06-18 21:46:42.669825-05:00 NaN B 2013-06-18 21:46:45.409733-05:00 A NaN 2013-06-18 21:46:47.130747-05:00 NaN B 2013-06-18 21:46:47.131314-05:00 NaN B 

This can go on for 3 or 4 or 10 columns, always 1 pd.notnull() , and the rest is NaN.

I want to combine them into 1 column as quickly as possible. How can i do this?

+4
source share
2 answers

You get one row per row, and the remaining cells are NaN , then the mathematics that are applied should request the value max :

  df.max(axis=1) 

According to the comment, if it doesn't work in Python 3, project the NaN into lines:

 df.fillna('').max(axis=1) 
+5
source

You could do

 In [278]: df = pd.DataFrame([[1, np.nan], [2, np.nan], [np.nan, 3]]) In [279]: df Out[279]: 0 1 0 1 NaN 1 2 NaN 2 NaN 3 In [280]: df.sum(1) Out[280]: 0 1 1 2 2 3 dtype: float64 

Since NaN are treated as 0 during summation, they are not displayed.

A few caveats: you must be sure that only one of the columns has a non-Nan for this. It will also work only with numeric data.

You can also use

 df.fillna(method='ffill', axis=1).iloc[:, -1] 

The last column will now contain all valid observations, since the actual ones were filled in front. See the documentation here . The second method should be more flexible, but slower. I cut off each row and last column with iloc[:, -1] .

0
source

Source: https://habr.com/ru/post/1487325/


All Articles