If you can read all of this in memory, you can use the method str
for vector operations:
>>> df = pd.read_csv("toolong.csv")
>>> df
a b c
0 1 1256378916212378918293 2
[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> df
a b c
0 1 1256378916 2
[1 rows x 3 columns]
Also note that you can get a series with a length using
>>> df["b"].str.len()
0 10
Name: b, dtype: int64
I was wondering if
>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})
a b c
0 1 12563 2
[1 rows x 3 columns]
it would be better, but I really don’t know if the converters will be called row by row or after the fact on the whole column.
source
share