Reading excel sheet with multiple headers using Pandas

Question

Reading excel sheet with multiple headers using Pandas

I have an excel sheet with several headers, for example:

_________________________________________________________________________ ____|_____| Header1 | Header2 | Header3 | ColX|ColY |ColA|ColB|ColC|ColD||ColD|ColE|ColF|ColG||ColH|ColI|ColJ|ColDK| 1 | ds | 5 | 6 |9 |10 | ....................................... 2 | dh | .......................................................... 3 | ge | .......................................................... 4 | ew | .......................................................... 5 | er | ..........................................................

Now you can see that the first two columns have no headers, they are empty, and the other columns have headers such as Header1, Header2 and Header3. Therefore, I want to read this sheet and combine it with another sheet with a similar structure.

I want to combine it in the first column of "ColX". Now I am doing this:

 import pandas as pd totalMergedSheet = pd.DataFrame([1,2,3,4,5], columns=['ColX']) file = pd.ExcelFile('ExcelFile.xlsx') for i in range (1, len(file.sheet_names)): df1 = file.parse(file.sheet_names[i-1]) df2 = file.parse(file.sheet_names[i]) newMergedSheet = pd.merge(df1, df2, on='ColX') totalMergedSheet = pd.merge(totalMergedSheet, newMergedSheet, on='ColX')

But I do not know that he is not reading the columns correctly, and I think that they will not return the results the way I want. So, I want the resulting frame to be as follows:

 ________________________________________________________________________________________________________ ____|_____| Header1 | Header2 | Header3 | Header4 | Header5 | ColX|ColY |ColA|ColB|ColC|ColD||ColD|ColE|ColF|ColG||ColH|ColI|ColJ|ColK| ColL|ColM|ColN|ColO||ColP|ColQ|ColR|ColS| 1 | ds | 5 | 6 |9 |10 | .................................................................................. 2 | dh | ................................................................................... 3 | ge | .................................................................................... 4 | ew | ................................................................................... 5 | er | ......................................................................................

Any suggestions please. Thanks.

+6

python pandas excel dataframe

muazfaiz Nov 11 '16 at 18:33

source share

1 answer

beeftendon · Accepted Answer · 2016-11-11T20:49:07+0000

Pandas already has a function that will be read throughout the entire Excel spreadsheet for you, so you do not need to manually analyze / merge each sheet. Take a look at pandas.read_excel () . It not only allows you to read in an Excel file on a single line, but also provides options to help solve the problem you are facing.

Since you have columns, you are looking for MultiIndexing . By default, pandas will read in the top line as a single header line. You can pass the header argument to pandas.read_excel() , which indicates how many lines should be used as headers. In your particular case, you need header=[0, 1] , specifying the first two lines. You can also have multiple sheets, so you can pass sheetname=None (this means going through all the sheets). The command will be as follows:

 df_dict = pandas.read_excel('ExcelFile.xlsx', header=[0, 1], sheetname=None)

This returns a dictionary in which the keys are sheet names and the values are DataFrames for each sheet. If you want to collapse all this into one DataFrame, you can simply use pandas.concat:

 df = pandas.concat(df_dict.values(), axis=0)

Reading excel sheet with multiple headers using Pandas

More articles: