What is a simple and efficient way to shuffle a data file in pandas, row or column? That is, how to write a shuffle(df, n, axis=0) function shuffle(df, n, axis=0) that takes a data frame, the number of shuffles is n , and the axis ( axis=0 are rows, axis=1 are columns) and returns a copy of the data frame that was shuffled n times
Edit : The key should do this without destroying the row / column labels in the data frame. If you just shuffle df.index , which loses all this information. I want the resulting df to be the same as the original, except that the row order or column order is different.
Edit2 : my question was unclear. When I say line shuffle, I mean random shuffle of each line. Therefore, if you have two columns a and b , I want each row to be shuffled by itself, so that you do not have the same associations between a and b , as you would if you simply reorder each row as a whole. Something like:
for 1...n: for each col in df: shuffle column return new_df
But, I hope, is more effective than the naive cycle. This does not work for me:
def shuffle(df, n, axis=0): shuffled_df = df.copy() for k in range(n): shuffled_df.apply(np.random.shuffle(shuffled_df.values),axis=axis) return shuffled_df df = pandas.DataFrame({'A':range(10), 'B':range(10)}) shuffle(df, 5)
python numpy pandas
user248237dfsf Apr 02 '13 at 18:50 2013-04-02 18:50
source share