How to check if .xls and .csv files are empty

Question

How to check if .xls and .csv files are empty

Question 1: How to verify that the entire .xls or .csv file is empty. This is the code I'm using:

try: if os.stat(fullpath).st_size > 0: readfile(fullpath) else: print "empty file" except OSError: print "No file"

An empty .xls file is larger than 5.6kb, so it is unclear if it has any content. How to check that xls or csv file is empty?

Question 2: I need to check the file header. How can I tell python that files containing only one line of headers are empty?

 import xlrd def readfile(fullpath) xls=xlrd.open_workbook(fullpath) for sheet in xls.sheets(): number_of_rows = sheet.nrows number_of_columns = sheet.ncols sheetname = sheet.name header = sheet.row_values(0) #Then if it contains only headers, treat it as empty.

This is my attempt. How to continue this code?

Please provide a solution for both questions. Thanks in advance.

+6

python python-2.7 csv xls xlrd

bob marti Mar 01 '17 at 16:37

source share

6 answers

Question 1: How to check that the entire .xls file is empty.

 def readfile(fullpath): xls = xlrd.open_workbook(fullpath) is_empty = None for sheet in xls.sheets(): number_of_rows = sheet.nrows if number_of_rows == 1: header = sheet.row_values(0) # then If it contains only headers I want to treat as empty if header: is_empty = False break if number_of_rows > 1: is_empty = False break number_of_columns = sheet.ncols sheetname = sheet.name if is_empty: print('xlsx ist empty')

Question 2: How do I check the file header. If the file has only a header (I mean only one line), I need to process the file empty. How can i do this.

 import csv with open('test/empty.csv', 'r') as csvfile: csv_dict = [row for row in csv.DictReader(csvfile)] if len(csv_dict) == 0: print('csv file is empty')

Tested with Python: 3.4.2

+3

stovfl Mar 13 '17 at 21:58

source share

I don't think Stackoverflow is solving 2 questions at a time, but let me give you my answer for the Excel part

 import xlrd from pprint import pprint wb = xlrd.open_workbook("temp.xlsx") empty_sheets = [sheet for sheet in wb.sheets() if sheet.ncols == 0] non_empty_sheets = [sheet for sheet in wb.sheets() if sheet.ncols > 0] # printing names of empty sheets pprint([sheet.name for sheet in empty_sheets]) # writing non empty sheets to database pass # write code yourself or ask another question

About the title: give me some hint, check sheet.nrows == 1 .

+1

Elmex80s Mar 01 '17 at 16:54

source share

For your excel code, I like the pandas solution someone came up with, but if you are at work and cannot install it, then I think you were almost there with the code approach that you took. You have a loop running through each sheet. This way you can test the rows on each sheet and then take the appropriate action if they are empty:

 import xlrd xlFile = "MostlyEmptyBook.xlsx" def readfile(xlFile): xls=xlrd.open_workbook(xlFile) for sheet in xls.sheets(): number_of_rows = sheet.nrows number_of_columns = sheet.ncols sheetname = sheet.name header = sheet.row_values(0) #then If it contains only headers I want to treat as empty if number_of_rows <= 1: # sheet is empty or has just a header # do what you want here print(xlFile + "is empty.")

Note. I added a variable for the file name to make it easier to change it in one place throughout the code when using it. I also added : in your ad function that lacked it. If you want the test to have only a title (mine includes a completely blank page), change <= to == .

Regarding the related csv problem. csv is just a text file. We can be pretty sure that the file is empty except for the header using an encoding approach similar to the one that follows. I would try this code on sample files, and you can customize my math logic. For example, it may be sufficient to use + 1 to compare if instead of *1.5 , as I did. I think this is a space, or if several characters were mistakenly included, it will be a good cushion of file size + characters on the second line test specified in the encoding logic.

This was written under the assumption that you want to know if a file is empty before you download a giant file to your computer. If this assumption is not correct, you can use my test logic and then keep the file open or even read it in another code to make sure there is no empty line followed by additional content after the header (in a poorly formatted input file)

 import os def convert_bytes(num): """ this function will convert bytes to MB.... GB... etc """ for x in ['bytes', 'KB', 'MB', 'GB', 'TB']: if num < 1024.0: return "%3.1f %s" % (num, x) num /= 1024.0 def file_size(file_path): """ this function will return the file size """ if os.path.isfile(file_path): file_info = os.stat(file_path) return convert_bytes(file_info.st_size) # testing if a csv file is empty in Python (header has bytes so not zero) fileToTest = "almostEmptyCSV.csv" def hasContentBeyondHeader(fileToTest): answer = [ True, 0, 0, 0] with open(fileToTest) as f: lis = [ f.readline(), f.readline() ] answer[1] = len(lis[0]) # length header row answer[2] = len(lis[1]) # length of next row answer[3] = file_size(fileToTest) # size of file # these conditions should be high confidence file is empty or nearly so sizeMult = 1.5 # test w/ your files and adjust as appropriate (but should work) charLimit = 5 if answer[1] * sizeMult > answer[2] and answer[2] == 0: answer[0] = False elif answer[1] * sizeMult > answer[2] and answer[2] < charLimit: # separate condition in case you want to remove it # returns False if only a small number of chars (charLimit) on 2nd row answer[0] = False else: answer[0] = True # added for readability (or delete else and keep default) f.close() return answer hasContentBeyondHeader(fileToTest) # False if believed to be empty except for header

During testing, readline commands extract this content from a file:

 ['year,sex,births\n', '']

sample output:

 [True, 16, 0, '17.0 bytes']

This approach means that you can access test results that are True / False in the [0] element of the list that it returns. Additional elements allow you to receive information about the inputs to the decision-making process for programs if you want to configure it later.

This code starts with a custom file size function. You can probably replace this depending on your preference if you are looking for shorter code. This will replace the first two tiny functions:

 import os os.path.getsize(fullpathhere)

+1

Tmwp Mar 15 '17 at 21:22

source share

how about this:

 file = open(path, "r") file_content = file.read() file.close() if file_content == "": print("File '{}' is empty".format(path)) else: rows = file_content.split("\n", 1) if rows[1] == "": print("File '{}' contains headers only.".format(path))

where path is the path to your xls or csv file.

+1

Purplejo Mar 16 '17 at 18:41

source share

According to your question:

Question 2: I need to check the file header. How can I tell python that files that are only one line of headers are empty?

You can just check the line in the files.

 with open('empty_csv_with_header.csv') as f: f.readline() # skip header line = f.readline() if line == b'': print('Empty csv')

0

tsh Dec 6 '17 at 12:28

source share

Someone · Accepted Answer · 2017-03-01T16:42:45+0000

It is simple in pandas using the .empty method. Do it

 import pandas as pd df = pd.read_csv(filename) # or pd.read_excel(filename) for xls file df.empty # will return True if the dataframe is empty or False if not.

This will also return True for a file with only headers, as in

 >> df = pd.DataFrame(columns = ['A','B']) >> df.empty True

How to check if .xls and .csv files are empty

More articles: