Python: Iterating through an object that executes code both at specific places and at the end

Question

Python: Iterating through an object that executes code both at specific places and at the end

Here is a sample code to explain:

outputText="" counter=0 for obj in specialObjects: if (obj.id < 400) or (obj.name.startswith("he")) or (obj.deliberateBreak==True): print "The object %s is causing a section break."%obj.details outputText = outputText.rjust(80) open("file%d.txt"%counter,"w").write(outputText) outputText="" outputText+=obj.shortValue() # THIS CODE IS DUPLICATED outputText = outputText.rjust(80) open("file%d.txt"%counter,"w").write(outputText)

What I need to do is iterate over the list of these special objects and check each time for several different conditions. If any of the conditions is met (as shown here), I need to take the current output buffer, write it to a file, and then start a new output buffer and continue processing.

The problem here is code duplication. Notice how the two lines are duplicated (outputText = and open). If I fail to include the second set of lines, the last set of objects will be processed, but their output will never be written.

I can imagine two possible solutions to prevent code duplication. Both of them seem a little inelegant, so I was wondering if there is any better way.

1) Wrap the code that will be repeated in the function.

 outputText="" counter=0 for obj in specialObjects: if (obj.id < 400) or (obj.name.startswith("he")) or (obj.deliberateBreak==True): print "The object %s is causing a section break."%obj.details counter = writeData(outputText) outputText="" outputText+=obj.shortValue() writeData(outputText,counter) def writeData(outputText,counter): outputText = outputText.rjust(80) open("file%d.txt"%counter,"w").write(outputText) return counter+1

2) Instead, use a numerical value for the loop, and consider that it is greater than the length of the list of objects; use this value as a flag to indicate "write, and now exit":

 outputText="" counter=0 for obj in range(len(specialObjects))+1: if (obj = len(specialObjects)) or (specialObjects[obj].id < 400) or (specialObjects[obj].name.startswith("he")) or (specialOejcts[obj].deliberateBreak==True): print "The object %s is causing a section break."%specialObjects[obj].details outputText = outputText.rjust(80) open("file%d.txt"%counter,"w").write(outputText) outputText="" if (obj==len(specialObjects)): break outputText+=specialObjects[obj].shortValue()

If I had to choose one, I would choose # 2, but this can lead to some strange cross cases with the expression “if”, if you ever need to use more complex logic.

Is there an even cleaner or poofier way to do this without duplicating code?

Thanks!

+5

python list code-duplication

fdmillion Dec 16 '14 at 18:15

source share

4 answers

jme · Answer 1 · 2014-12-16T19:26:28+0000

When I find myself writing code in a way where I repeat the assembly and repeat the code after the loop ends, I usually take this as a sign that I am not trying out the right thing.

In this case, you repeat the list of objects. But what you really want to iterate over, I think, is a list of groups of objects. What itertools.groupby is useful for.

There is a lot going on in your code, so I'm going to use a simplified example to illustrate how you can get rid of this duplicate code. Let's say, for a (very far-fetched) example, that I have a list of things like this:

 things = ["apples", "oranges", "pears", None, "potatoes", "tomatoes", None, "oatmeal", "eggs"]

This is a list of objects. On closer inspection, there are several groups of objects separated by the None symbol (note that you usually think of things as a nested list, but don't ignore this for the purposes of the example). My goal is to print each group on a separate line:

 apples, oranges, pears potatoes, tomatoes oatmeal, eggs

Here is an ugly way to do this:

 current_things = [] for thing in things: if thing is None: print ", ".join(current_things) current_things = [] else: current_things.append(thing) print ", ".join(current_things)

As you can see, after the loop we duplicate print . Nasty!

Here is a solution using groupby :

 from itertools import groupby for key, group in groupby(things, key=lambda x: x is not None): if key: print ", ".join(group)

groupby accepts groupby ( things ) and key function. It looks at each element of the iteration and applies a key function. When the key changes the value, a new group is formed. The result is an iterator that returns pairs (key, group) .

In this case, we will use check for None as our key function. Therefore, we need if key: since there will be one-sized groups corresponding to the None elements of our list. We just skip them.

As you can see, groupby allows us to groupby over the things that we really want to groupby over: groups of objects. This is more natural for our problem, and as a result, the code is simplified. It looks like your code is very similar to the example above, except that your key function checks the various properties of the object ( obj.id < 400 ... ). I will leave the implementation details to you ...

Mark ransom · Answer 2 · 2014-12-16T20:11:25+0000

Here you can do this using the sentinel object. This sounds like your second option, but cleaner, I think.

 for obj in itertools.chain(specialObjects, [None]): if (obj is None) or (obj.id < 400) or (obj.name.startswith("he")) or (obj.deliberateBreak==True): outputText = outputText.rjust(80) open("file%d.txt"%counter,"w").write(outputText) if obj is None: break print "The object %s is causing a section break."%obj.details outputText="" outputText+=obj.shortValue()

tdelaney · Answer 3 · 2014-12-16T18:32:15+0000

You can split the code that breaks the objects into a generator, so the next processing step does not need to be duplicated.

 def yield_sections(specialObjects): outputText = '' for obj in specialObjects: if (obj.id < 400) or (obj.name.startswith("he")) or (obj.deliberateBreak==True): yield outputText outputText = '' outputText += obj.shortValue() if outputText: yield outputText for counter, outputText in enumerate(yield_sections(specialObjects)): outputText = outputText.rjust(80) open("file%d.txt"%counter,"w").write(outputText)

Kiwi · Answer 4 · 2014-12-16T20:00:59+0000

There is a solution, if you use iterators, next can give a special value at the end. That way, you can use the controller to check if your current object is true, or if you have completed the iteration.

Try something like this:

 outputText="" counter=0 ending = object() it = iter(specialObjects) while True: obj = next(it, ending) if obj is ending or obj.id < 400 or obj.name.startswith("he") or obj.deliberateBreak: outputText = outputText.rjust(80) open("file%d.txt"%counter,"w").write(outputText) counter += 1 outputText="" if obj is ending: break outputText+=obj.shortValue()

Python: Iterating through an object that executes code both at specific places and at the end

More articles: