Beginning with:
df = pd.DataFrame(np.random.random((1000, 100)))
Adding individual columns:
def cols_via_apply(df): for i in range(100, 150): df[i] = df[i-100].apply(lambda x: x * i) return df %timeit cols_via_apply(df) 10 loops, best of 3: 29.6 ms per loop <class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 999 Columns: 150 entries, 0 to 149 dtypes: float64(150) memory usage: 1.2 MB None
seems a bit more efficient than using pd.concat - presumably because the rows DataFrame loop is DataFrame . Thus, this difference will worsen as the length of the DataFrame increases:
def cols_via_concat(df): df = pd.concat([df, df.apply(lambda row: pd.Series({i : i * row[i-100] for i in range(100, 150)}), axis=1)]) return df %timeit cols_via_concat(df) 1 loops, best of 3: 450 ms per loop <class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 999 Columns: 150 entries, 0 to 149 dtypes: float64(150) memory usage: 1.2 MB None
source share