Situation
I need to create a pandas framework from a CSV file that has the following characteristics:
- The separator used by the file can be either a comma or a space, and I donβt know in advance which file will have it.
- At the top of the file, there can be one or more comment lines starting with
#.
Problem
I tried to solve this problem with help pd.read_csvwith sep=Noneand arguments comment='#'. In my opinion, the argument sep=Nonetells pandas to automatically determine the delimiter character, and the argument comment='#'tells pandas that all lines starting with #are comment lines that should be ignored.
These arguments work fine when used individually. However, when I use them together, I get an error message TypeError: expected string or bytes-like object. The following code example demonstrates this:
from io import StringIO
import pandas as pd
tabular_data = (
'# Data generated on 04 May 2017\n'
'col1,col2,col3\n'
'5.9,7.8,3.2\n'
'7.1,0.4,8.1\n'
'9.4,5.4,1.9\n'
)
df1 = pd.read_csv(StringIO(tabular_data), sep=None)
print(df1)
df2 = pd.read_csv(StringIO(tabular_data), comment='#')
print(df2)
df3 = pd.read_csv(StringIO(tabular_data), sep=None, comment='#')
print(df3)
Unfortunately, I really don't understand what causes the error. Can anyone here help me solve this problem?
source
share