Randomly insert NA values ​​into pandas framework

How can I accidentally paste np.naninto a DataFrame? Let's say I want 10% of the null values ​​inside my DataFrame.

My data is as follows:

df = pd.DataFrame(np.random.randn(5, 3), 
                  index=['a', 'b', 'c', 'd', 'e'],
                  columns=['one', 'two', 'three'])

        one       two     three
a  0.695132  1.044791 -1.059536
b -1.075105  0.825776  1.899795
c -0.678980  0.051959 -0.691405
d -0.182928  1.455268 -1.032353
e  0.205094  0.714192 -0.938242

Is there an easy way to insert null values?

+4
source share
1 answer

Here you can clear exactly 10% of the cells (more precisely, up to 10%, which can be achieved using the existing data frame size).

import random
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
    df.iat[row, col] = np.nan

Here's a way to clean cells independently with a 10% chance of each cell.

df = df.mask(np.random.random(df.shape) < .1)
+6
source

Source: https://habr.com/ru/post/1651887/


All Articles