Pandas String Shift NaNs

Say we have a data structure:

x = pd.DataFrame(np.random.randint(1, 10, 30).reshape(5,6), columns=[f'col{i}' for i in range(6)]) x['col6'] = np.nan x['col7'] = np.nan col0 col1 col2 col3 col4 col5 col6 col7 0 6 5 1 5 2 4 NaN NaN 1 8 8 9 6 7 2 NaN NaN 2 8 3 9 6 6 6 NaN NaN 3 8 4 4 4 8 9 NaN NaN 4 5 3 4 3 8 7 NaN NaN 

When x.shift(2, axis=1) is col2 -> col5 , col2 -> col5 shifts correctly, but col6 and col7 remain as NaN ? How can I overwrite NaN values ​​in col6 and col7 col4 and col5 ? Is this a mistake or an intention?

  col0 col1 col2 col3 col4 col5 col6 col7 0 NaN NaN 6.0 5.0 1.0 5.0 NaN NaN 1 NaN NaN 8.0 8.0 9.0 6.0 NaN NaN 2 NaN NaN 8.0 3.0 9.0 6.0 NaN NaN 3 NaN NaN 8.0 4.0 4.0 4.0 NaN NaN 4 NaN NaN 5.0 3.0 4.0 3.0 NaN NaN 
+5
source share
1 answer

Perhaps this is a mistake, you can use np.roll to achieve this:

 In[11]: x.apply(lambda x: np.roll(x, 2), axis=1) Out[11]: col0 col1 col2 col3 col4 col5 col6 col7 0 NaN NaN 6.0 5.0 1.0 5.0 2.0 4.0 1 NaN NaN 8.0 8.0 9.0 6.0 7.0 2.0 2 NaN NaN 8.0 3.0 9.0 6.0 6.0 6.0 3 NaN NaN 8.0 4.0 4.0 4.0 8.0 9.0 4 NaN NaN 5.0 3.0 4.0 3.0 8.0 7.0 

Speedwise, most likely, it’s faster to build df and reuse existing columns and pass the result to np.roll , since the data leads the constructor to a DataFrame :

 In[12]: x = pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns) x Out[12]: col0 col1 col2 col3 col4 col5 col6 col7 0 NaN NaN 6.0 5.0 1.0 5.0 2.0 4.0 1 NaN NaN 8.0 8.0 9.0 6.0 7.0 2.0 2 NaN NaN 8.0 3.0 9.0 6.0 6.0 6.0 3 NaN NaN 8.0 4.0 4.0 4.0 8.0 9.0 4 NaN NaN 5.0 3.0 4.0 3.0 8.0 7.0 

<strong> timings

 In[13]: %timeit pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns) %timeit x.fillna(0).astype(int).shift(2, axis=1) 10000 loops, best of 3: 117 µs per loop 1000 loops, best of 3: 418 µs per loop 

Therefore, building a new df with the result np.roll is faster than filling in NaN values ​​other than int , and then shift ing.

+4
source

Source: https://habr.com/ru/post/1275255/


All Articles