EDIT: I am slightly modifying this answer. I will leave the original answer below.
In my other answer, I commented that it would be best to find a Python built-in module that will do the unpacking. I could not think of one thing, but perhaps I should have Google search for it. @John Machin provided an answer that showed how to do this: use the Python module module struct . Since this is written in C, it should be faster than my pure Python solution. (I actually didnโt measure anything, so this is an assumption.)
I agree that the logic in the source code is "non-Pythonic." Returning a sentinel value is no better; itโs better to either return the actual value or raise the exception. Another way to do this is to return a list of valid values โโplus another list of invalid values. Since @John Machin was suggesting code to get real values, I thought I would write a version here that returns two lists.
NOTE. Perhaps the best possible answer would be @John Machin's answer and modify it to save the invalid values โโin a file for possible future review. His answer gives answers one at a time, so there is no need to create a large list of analyzed records; and keeping bad lines on disk means that you donโt have to create as many lists of bad lines as possible.
import struct def parse_records(self): """ returns a tuple: (good, bad) good is a list of valid records (as tuples) bad is a list of tuples: (line_num, line, err) """ cols = self.Columns() unpack_fmt = "" sign_checks = [] start = 0 for colx, info in enumerate(cols, 1): clen = info.columnLength if clen < 1: raise ValueError("Column %d: Bad columnLength %r" % (colx, clen)) if info.skipColumn: unpack_fmt += str(clen) + "x" else: unpack_fmt += str(clen) + "s" if info.hasSignage: sign_checks.append(start) start += clen expected_len = start unpack = struct.Struct(unpack_fmt).unpack good = [] bad = [] for line_num, line in enumerate(self.whatever_the_list_of_lines_is, 1): if len(line) != expected_len: bad.append((line_num, line, "bad length")) continue if not all(line[i] in '+-' for i in sign_checks): bad.append((line_num, line, "sign check failed")) continue good.append(unpack(line)) return good, bad
ORIGINAL TEXT ANSWER: This answer should be much faster if the self.Columns() identical to all records. We process the information self.Columns() once and create a couple of lists that contain only what we need to process the record.
This code shows how to compute a parsedList , but does not actually return it or does not return it or does nothing with it. Obviously, you will need to change this.
def parse_records(self): cols = self.Columns() slices = [] sign_checks = [] start = 0 for info in cols: if info.columnLength < 1: raise ValueError, "bad columnLength" end = start + info.columnLength if not info.skipColumn: tup = (start, end) slices.append(tup) if info.hasSignage: sign_checks.append(start) expected_len = end