Three thousand columns are not so many. How many lines do you have? You can always make a random data size of the same size and make a logical replacement (the size of your file system will determine if this is possible or not.
if you know the size of your data frame:
import pandas as pd import numpy as np
if you donβt know the size of your data framework, just mix things around
import pandas as pd import numpy as np
EDIT By "users" last comment: "dfrand [np.isnan (data)] returns only NaN".
Right! And that is exactly what you wanted. In my solution, I have: data [np.isnan (data)] = dfrand [np.isnan (data)]. Translated, this means: take a randomly generated value from dfrand, which corresponds to the location of NaN in the "data", and insert it into the "data", where "data" is NaN. An example will help:
a = pd.DataFrame(data=np.random.randint(0,100,(10,3))) a[0][5] = np.nan In [32]: a Out[33]: 0 1 2 0 2 26 28 1 14 79 82 2 89 32 59 3 65 47 31 4 29 59 15 5 NaN 58 90 6 15 66 60 7 10 19 96 8 90 26 92 9 0 19 23 # define randomly-generated dataframe, much like what you are doing, and replace NaN's b = pd.DataFrame(data=np.random.randint(0,100,(10,3))) In [39]: b Out[39]: 0 1 2 0 92 21 55 1 65 53 89 2 54 98 97 3 48 87 79 4 98 38 62 5 46 16 30 6 95 39 70 7 90 59 9 8 14 85 37 9 48 29 46 a[np.isnan(a)] = b[np.isnan(a)] In [38]: a Out[38]: 0 1 2 0 2 26 28 1 14 79 82 2 89 32 59 3 65 47 31 4 29 59 15 5 46 58 90 6 15 66 60 7 10 19 96 8 90 26 92 9 0 19 23
As you can see, all NaN in were replaced by a randomly generated value depending on the nan-value indices.
source share