My csv file contains 6 million entries, and I am trying to split it into several smaller files using skiprows . My version of Pandas is "0.12.0" and the code is
pd.read_csv(TRAIN_FILE, chunksize=50000, header=None, skiprows=999999, nrows=100000)
It works as long as skiprows are less than 900,000. Any idea if it is expected? If I do not use skiprows, my burrows can go up to 5 million records. I have not tried this yet. try this as well.
tried csv splitter, but it does not work properly for the first record, maybe because each cell consists of several lines of code, etc.
EDIT : I can split it into csv by reading the entire 7GB file using Pandas read_csv and writing parts to several csv files.
source share