I think it's better to use the read_csv function with the quoting=csv.QUOTE_NONE and error_bad_lines=False parameters. link
import pandas as pd import csv test = pd.read_csv("output/Emails.csv", quoting=csv.QUOTE_NONE, error_bad_lines=False) print (test.shape)
But some data (problematic) will be skipped.
If you want to skip email data tags, you can use:
import pandas as pd import csv test = pd.read_csv("output/Emails.csv", quoting=csv.QUOTE_NONE, sep=',', error_bad_lines=False, header=None, names=["Id","DocNumber","MetadataSubject","MetadataTo","MetadataFrom","SenderPersonId","MetadataDateSent","MetadataDateReleased","MetadataPdfLink","MetadataCaseNumber","MetadataDocumentClass","ExtractedSubject","ExtractedTo","ExtractedFrom","ExtractedCc","ExtractedDateSent","ExtractedCaseNumber","ExtractedDocNumber","ExtractedDateReleased","ExtractedReleaseInPartOrFull","ExtractedBodyText","RawText"]) print (test.shape)
source share