NaN values ​​in the pivot_table index result in data loss

Here is a simple DataFrame:

> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'],
                     'b': ['optional1', None, 'optional3'],
                     'c': ['c1', 'c2', 'c3'],
                     'd': [1, 2, 3]})
> df

    a          b   c  d
0  a1  optional1  c1  1
1  a2       None  c2  2
2  a3  optional3  c3  3

Consolidated Method 1

Data can be rotated like this:

> df.pivot_table(index=['a','b'], columns='c')
                d     
c              c1   c3
a  b                  
a1 optional1  1.0  NaN
a3 optional3  NaN  3.0

Downside: the data in the second row is lost because df['b'][1] == None.

Consolidated Method 2

> df.pivot_table(index=['a'], columns='c')
      d          
c    c1   c2   c3
a                
a1  1.0  NaN  NaN
a2  NaN  2.0  NaN
a3  NaN  NaN  3.0

Reduction: column is blost.

How can two methods be combined so that the columns band the second row are saved as follows:

                d     
c              c1   c2   c3
a  b                  
a1 optional1  1.0  NaN  NaN
a2      None  NaN  2.0  NaN
a3 optional3  NaN  NaN  3.0

In general . How to save information from a string during a rotation, if the key matters NaN?

+4
source share
2 answers

Use set_indexand unstackto perform a turn:

df = df.set_index(['a', 'b', 'c']).unstack('c')

, pandas pivot. stack unstack pivot , .

:

                d          
c              c1   c2   c3
a  b                       
a1 optional1  1.0  NaN  NaN
a2 NaN        NaN  2.0  NaN
a3 optional3  NaN  NaN  3.0
+2

fillna None:

df['b'] = df['b'].fillna('foo')
df.pivot_table(index=['a','b'], columns=['c'])
----
                    d          
c              c1   c2   c3
a  b                       
a1 optional1  1.0  NaN  NaN
a2 foo        NaN  2.0  NaN
a3 optional3  NaN  NaN  3.0
0

Source: https://habr.com/ru/post/1667692/


All Articles