Column matching and adding to a data frame, Python 3.6

Question

Column matching and adding to a data frame, Python 3.6

I have about 50 excel files and I want to import into a dataframe and merge all the files into a single data file. But in some file there are 3 some of 4 columns. Each file as different columns in a different order.

Total column from all files: 5 ie col1, col2, col3, col4, col5

I know how to import, but when I start, problems with the problem.

Script:

dfAll = pd.DataFrame(columns=['col1', 'col2', 'col3', 'col4', 'col5')]
df= pd.read_excel('FilePath', sheetname='data1') # contains 3 columns i.e col1, col2, col5
columnsOFdf = df.columns
dfAll[columnsOFdf] = dfAll.append(df)

but its throwing error "ValueError: Columns should be the same length as the key"

I want to add the data df ['col1', 'col2', 'col5'] to dfAll ['col1', 'col2', 'col5']

Please help on this issue.

+4

python python-3.x pandas append dataframe

faithon.gvr.py Sep 06 '17 at 14:00

source share

3 answers

One solution is to add empty columns to the data files that you load from Excel files:

columns = ['col1', 'col2', 'col3', 'col4', 'col5']
dfAll = pd.DataFrame(columns=columns)
df= pd.read_excel('FilePath', sheetname='data1') # contains 3 columns i.e             col1, col2, col5
columnsOFdf = df.columns
for column in columns:
    if column not in columnsOFdf:
        df[column] = [""] * df.shape[0]
dfAll.append(df)

+1

eqperes Sep 06 '17 at 14:10

source share

:

[dfAll.append(i) for i in df]

, .

+1

Jorge Alberto Rueda Flores 06 . '17 14:11

Alexander · Accepted Answer · 2017-09-06T14:14:10+0000

Concatenation will match your columns

dfs = []
files = [...]
for file_name in files:
    dfs.append(pd.read_excel(file_name, sheetname='data1'))
df = pd.concat(dfs)

df1 = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'))
df2 = pd.DataFrame(np.random.randn(3, 3), columns=list('BCD'))
>>> pd.concat([df1, df2])
          A         B         C         D
0 -2.329280  0.644155 -0.835137       NaN
1  0.666496 -1.299048  0.111579       NaN
2  1.855494 -0.085850 -0.541890       NaN
0       NaN -1.131514  1.023610 -0.514384
1       NaN  0.670063  1.403143 -0.978611
2       NaN -0.314741 -0.727200 -0.620511

In addition, each time you add a data frame to an existing one, it returns a copy. This will seriously degrade performance and will be called a quadratic copy. It is best to create a list of all data frames and then concatenate the result.

Column matching and adding to a data frame, Python 3.6

More articles: