Pythonic way to determine if non-null list entries are "continuous",

I am looking for a way to easily determine if all elements are not from a list in a single continuous slice. I will use integers as examples of not all elements.

For example, the list [None, None, 1, 2, 3, None, None] meets my requirements for continuous whole records. In contrast, [1, 2, None, None, 3, None] is not continuous because there are no entries between the integers.

A few more examples to make this as clear as possible.

Continuous :
[1, 2, 3, None, None]
[None, None, 1, 2, 3]
[None, 1, 2, 3, None]

Continuous :
[None, 1, None, 2, None, 3]
[None, None, 1, None, 2, 3]
[1, 2, None, 3, None, None]

My first approach was to use variables to keep track of whether we met None or not, and whether we got to int - it ends up being very nested and it is very difficult to follow a series of if / else statements built into the for loop. (On top of the ugliness, I confess that I did not get it to work in every case).

Does anyone know an easier way to find out if there are any items in a list in one continuous slice?

+45
python list null slice
Feb 06 '13 at 4:13
source share
11 answers
 def contiguous(seq): seq = iter(seq) all(x is None for x in seq) # Burn through any Nones at the beginning any(x is None for x in seq) # and the first group return all(x is None for x in seq) # everthing else (if any) should be None. 

Here are some examples. You can use next(seq) to get the next element from the iterator. I will put a sign pointing to the next element after each

example1:

 seq = iter([None, 1, 2, 3, None]) # [None, 1, 2, 3, None] # next^ all(x is None for x in seq) # next^ any(x is None for x in seq) # next^ (off the end) return all(x is None for x in seq) # all returns True for the empty sequence 

example2:

 seq = iter([1, 2, None, 3, None, None]) # [1, 2, None, 3, None, None] # next^ all(x is None for x in seq) # next^ any(x is None for x in seq) # next^ return all(x is None for x in seq) # all returns False when 3 is encountered 
+44
Feb 06 '13 at 4:41
source share

Good 'ol itertools.groupby to the rescue:

 from itertools import groupby def contiguous(seq): return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1 

gives

 >>> contiguous([1,2,3,None,None]) True >>> contiguous([None, 1,2,3,None]) True >>> contiguous([None, None, 1,2,3]) True >>> contiguous([None, 1, None, 2,3]) False >>> contiguous([None, None, 1, None, 2,3]) False >>> contiguous([None, 1, None, 2, None, 3]) False >>> contiguous([1, 2, None, 3, None, None]) False 

[edit]

Since there are some discussions in the comments, I will explain why I like this approach better than some others.

We are trying to find out if there is one continuous group of objects other than None, and

 sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) 

counts the number of adjacent objects other than None using the function in stdlib, which is designed to collect adjacent groups. As soon as we see groupby , we are considered "adjacent groups" and vice versa. In this sense, it is self-documenting. This is basically the definition of my goal.

IMHO The only weakness is that it does not close, and this can be fixed, but having thought about it, I still prefer it, because it uses a primitive that I like, "count the number of adjacent non-" No groups ", which I prefer to just "tell me if you have more than one continuous group different from the group" as soon as you can. "

Many of the approaches to implementing the latter depend on smart observation of the problem, for example, โ€œif there is only one continuous group of no-No objects,โ€ then if we scan until we find the first no-No object, then scan the objects until until we find the first group other than None, if it exists, is there anything left that None gives us our answer. "(Or something like that, which is part of my problem: I have to think about it .) It seems to me that I am using the "implementation details" about the problem to solve it and the tricks uetsya problems properties that we can use to solve it rather than just give a problem in Python and let Python do the job.

I am a bear with a very small brain, as they say, and I like to avoid being smart, because, in my experience, the route is dotted with FAIL.

As always, each run can vary, of course, and is probably proportional to their skill.

+25
Feb 06 '13 at 4:20
source share

You can use something like itertools.groupby :

 from itertools import groupby def are_continuous(items): saw_group = False for group, values in groupby(items, lambda i: i is not None): if group: if saw_group: return False else: saw_group = True return True 

This will be repeated only until he sees the group twice. I'm not sure if you consider [None, None] , so customize it to your needs.

+12
Feb 06 '13 at 4:21
source share

This may not be the best way to do this, but you can find the first record other than None and the last non-None record, and then check the slice for None . eg:.

 def is_continuous(seq): try: first_none_pos = next(i for i,x in enumerate(seq) if x is not None) #need the or None on the next line to handle the case where the last index is `None`. last_none_pos = -next(i for i,x in enumerate(reversed(seq)) if x is not None) or None except StopIteration: #list entirely of `Nones` return False return None not in seq[first_none_pos:last_none_pos] assert is_continuous([1,2,3,None,None]) == True assert is_continuous([None, 1,2,3,None]) == True assert is_continuous([None, None, 1,2,3]) == True assert is_continuous([None, 1, None, 2,3]) == False assert is_continuous([None, None, 1, None, 2,3]) == False assert is_continuous([None, 1, None, 2, None, 3]) == False assert is_continuous([1, 2, None, 3, None, None]) == False 

This will work for any type of sequence.

+7
Feb 06 '13 at 4:21
source share

A natural way to use sequence elements is to use dropwhile :

 from itertools import dropwhile def continuous(seq): return all(x is None for x in dropwhile(lambda x: x is not None, dropwhile(lambda x: x is None, seq))) 

We can express this without nested function calls:

 from itertools import dropwhile def continuous(seq): core = dropwhile(lambda x: x is None, seq) remainder = dropwhile(lambda x: x is not None, core) return all(x is None for x in remainder) 
+7
Feb 06 '13 at 10:44
source share

One liner:

 contiguous = lambda l: ' ' not in ''.join('x '[x is None] for x in l).strip() 

The real work is done using the strip function. If there are spaces in the split line, then they are not kept / terminated. The rest of the function converts the list to a string that has a space for each None .

+5
Feb 06 '13 at 12:11
source share

I did some profiling to compare the @gnibbler approach with the groupby approach. The @gnibber approach is consistently faster, especially. for longer lists. For example, I see about a 50% increase in productivity for random inputs with a length of 3-100, with a 50% probability of containing one sequential sequence (randomly selected) and otherwise with random values. Test code below. I have summarized two methods (the random choice that comes first) to make sure that the caching effects are canceled. Based on this, I would say that although the groupby approach is more intuitive, the @gnibber approach may be appropriate if profiling indicates that this is an important part of the general code for optimization - in this case, appropriate comments should be used to indicate what happens to using all / any values โ€‹โ€‹of the consumer iterator.

 from itertools import groupby import random, time def contiguous1(seq): # gnibber approach seq = iter(seq) all(x is None for x in seq) # Burn through any Nones at the beginning any(x is None for x in seq) # and the first group return all(x is None for x in seq) # everthing else (if any) should be None. def contiguous2(seq): return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1 times = {'contiguous1':0,'contiguous2':0} for i in range(400000): n = random.randint(3,100) items = [None] * n if random.randint(0,1): s = random.randint(0,n-1) e = random.randint(0,ns) for i in range(s,e): items[i] = 3 else: for i in range(n): if not random.randint(0,2): items[i] = 3 if random.randint(0,1): funcs = [contiguous1, contiguous2] else: funcs = [contiguous2, contiguous1] for func in funcs: t0 = time.time() func(items) times[func.__name__] += (time.time()-t0) print for f,t in times.items(): print '%10.7f %s' % (t, f) 
+3
Feb 06 '13 at 18:10
source share

Here's a numpy based solution. Get the indexes of the array of all nonzero elements. Then compare each index with the one that follows it. If the difference is greater than one, there are zeros between nonzero values. If there are no indices where the next index is greater than one, then there are no spaces.

 def is_continuous(seq): non_null_indices = [i for i, obj in enumerate(seq) if obj is not None] for i, index in enumerate(non_null_indices[:-1]): if non_null_indices[i+1] - index > 1: return False return True 
+2
Feb 06 '13 at 17:57
source share

This algorithm does work with several flaws (removes items from the list). But that is the solution.

Basically, if you remove all continuous None from the beginning and the end. And if you find a number in the None list, then integers are not in continuous form.

 def is_continuous(seq): while seq and seq[0] is None: del seq[0] while seq and seq[-1] is None: del seq[-1] return None not in seq assert is_continuous([1,2,3,None,None]) == True assert is_continuous([None, 1,2,3,None]) == True assert is_continuous([None, None, 1,2,3]) == True assert is_continuous([None, 1, None, 2,3]) == False assert is_continuous([None, None, 1, None, 2,3]) == False assert is_continuous([None, 1, None, 2, None, 3]) == False assert is_continuous([1, 2, None, 3, None, None]) == False 

However, another example of how small code can become evil.

I would like strip() be available for list .

+1
Feb 06 '13 at 5:50
source share

My first approach was to use variables to track ...

... this ends with a very nested and very difficult sequence of if / else sequences embedded in the for loop ...

No! In fact, you only need one variable. Understanding this problem from the point of view of a finite state machine (FSM) with your approach will lead to a rather pleasant solution.

We call the state p . First p is 0. Then we begin to walk between states.

FSM

When all the items in the list are checked and still not fail, then the answer is True .

One version that encodes a translation table in a dict

 def contiguous(s, _D={(0,0):0, (0,1):1, (1,0):2, (1,1):1, (2,0):2, (2,1):3}): p = 0 for x in s: p = _D[p, int(x is not None)] if p >= 3: return False return True 

Another version using the if statement:

 def contiguous(s): p = 0 for x in s: if x is None and p == 1 or x is not None and (p == 0 or p == 2): p += 1 if p >= 3: return False return True 

So I want to say that using if and for is still python.

Update

I found another way to encode FSM. We can pack the translation table into a 12-bit integer.

 def contiguous(s): p = 0 for x in s: p = (3684 >> (4*p + 2*(x!=None))) & 3 if p >= 3: return False return True 

Here is 3684, the magic number, you can get:

  _D[p,a] 3 2 1 2 1 0 p 2 2 1 1 0 0 a 1 0 1 0 1 0 bin(3684) = 0b 11 10 01 10 01 00 

The readability is not as good as the other, but faster, since it avoids dictionary searches. The second version is as fast as this, but this coding idea can be generalized to solve more problems.

+1
Feb 16 '13 at 16:06
source share

Here is a way to use numpy:

 a = np.array([1, 2, 3, np.nan, 4, 5, np.nan, 6, 7]) # This returns indices of nans # eg. [[3], [6]] # use .squeeze() to convert to [3, 6] aa = np.argwhere(a != a).squeeze() # use a diff on your array , if the nans # are continuous, the diff will always be 1 # if not, diff will be > 1 , and using any() will return True any(np.diff(aa) > 1) 
0
Jul 29 '19 at 12:32
source share



All Articles