Using pandas in python to add CSV files to one

I have n files in a directory that I need to merge into one. They have the same number of columns, for example, the contents of test1.csv :

 test1,test1,test1 test1,test1,test1 test1,test1,test1 

Similarly, the contents of test2.csv :

 test2,test2,test2 test2,test2,test2 test2,test2,test2 

I want final.csv to look like this:

 test1,test1,test1 test1,test1,test1 test1,test1,test1 test2,test2,test2 test2,test2,test2 test2,test2,test2 

But instead it turns out like this:

 test file 1,test file 1.1,test file 1.2,test file 2,test file 2.1,test file 2.2 ,,,test file 2,test file 2,test file 2 ,,,test file 2,test file 2,test file 2 test file 1,test file 1,test file 1,,, test file 1,test file 1,test file 1,,, 

Can someone help me figure out what's going on here? I pasted my code below:

 import csv import glob import pandas as pd import numpy as np all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files for f in glob.glob("*.csv"): #for all csv files in pwd df = pd.read_csv(f) #create dataframe for reading current csv all_data = all_data.append(df) #appends current csv to final DF all_data.to_csv("final.csv", index=None) 
+5
source share
3 answers

I think there are more problems:

  • I removed import csv and import numpy as np because they are not used in this demo (but maybe they are missing lines so they can be imported)
  • I created a list of all dfs data frames where dataframes are added by dfs.append(df) . Then I used the concat function to concat this list to the final data frame.
  • In the read_csv function read_csv I added the header=None parameter, because the main problem was that read_csv reads the first line as header ,
  • In the to_csv function to_csv I added the header=None parameter to exclude the header.
  • I added the test folder to the destination destination file, because if you use the glob.glob("*.csv") function glob.glob("*.csv") , you must read the output file as an input file.

Decision:

 import glob import pandas as pd all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files #list of all df dfs = [] for f in glob.glob("*.csv"): #for all csv files in pwd #add parameters to read_csv df = pd.read_csv(f, header=None) #create dataframe for reading current csv #print df dfs.append(df) #appends current csv to final DF all_data = pd.concat(dfs, ignore_index=True) print all_data # 0 1 2 #0 test1 test1 test1 #1 test1 test1 test1 #2 test1 test1 test1 #3 test2 test2 test2 #4 test2 test2 test2 #5 test2 test2 test2 all_data.to_csv("test/final.csv", index=None, header=None) 

The following solution is similar.
I add the header=None parameter to read_csv and to_csv and add the ignore_index=True parameter to the append .

 import glob import pandas as pd all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files for f in glob.glob("*.csv"): #for all csv files in pwd df = pd.read_csv(f, header=None) #create dataframe for reading current csv all_data = all_data.append(df, ignore_index=True) #appends current csv to final DF print all_data # 0 1 2 #0 test1 test1 test1 #1 test1 test1 test1 #2 test1 test1 test1 #3 test2 test2 test2 #4 test2 test2 test2 #5 test2 test2 test2 all_data.to_csv("test/final.csv", index=None, header=None) 
+5
source

You can concat . Let df1 be your first data framework, and df2 second, you can:

 df = pd.concat([df1,df2],ignore_index=True) 

ignore_index is optional, you can set it to True if you don't mind the original single data indices.

+2
source

pandas not a tool to use when all you need to do is create one csv file, you can just write each csv to a new file as you go:

 import glob with open("out.csv","w") as out: for fle in glob.glob("*.csv"): with open(fle) as f: out.writelines(f) 

Or using csv lib if you prefer:

 import glob import csv with open("out.csv", "w") as out: wr = csv.writer(out) for fle in glob.glob("*.csv"): with open(fle) as f: wr.writerows(csv.reader(f)) 

Creating a large data framework to ultimately write to disk does not make any real sense, moreover, if you had many large files, it is not even possible.

+1
source

Source: https://habr.com/ru/post/1237993/


All Articles