Python Pandas add Column CSV file name

My python code is working correctly in the example below. My code combines a directory of CSV files and matches the headers. However, I want to take another step: how to add a column that adds the name of the CSV file that was used?

import pandas as pd import glob globbed_files = glob.glob("*.csv") #creates a list of all csv files data = [] # pd.concat takes a list of dataframes as an agrument for csv in globbed_files: frame = pd.read_csv(csv) data.append(frame) bigframe = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes bigframe.to_csv("Pandas_output2.csv") 
+9
source share
2 answers

This should work:

 import os for csv in globbed_files: frame = pd.read_csv(csv) frame['filename'] = os.path.basename(csv) data.append(frame) 

frame['filename'] creates a new column named filename and os.path.basename() turns the path as /a/d/c.txt into the name of the c.txt file.

+14
source

Mike answer above works fine. In case any googler encounters the following error:

 >>> TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid 

This is possible because the delimiter is not correct. I used a custom CSV file, so the delimiter was ^ . Because of this, I needed to enable the delimiter in the pd.read_csv call.

 import os for csv in globbed_files: frame = pd.read_csv(csv, sep='^') frame['filename'] = os.path.basename(csv) data.append(frame) 
0
source

Source: https://habr.com/ru/post/1263343/


All Articles