In Swift, an efficient function that splits an array into 2 arrays based on a predicate

Note. I am currently using Swift 2.2, but also opening Swift 3 solutions

I am looking to create a function that works very closely with filter , except that it also saves the results of the mismatch and maintains the sort order. For example, let's say you want to filter out numbers divisible by 3 in an array and save a list of numbers that are not divisible by 3.


Option 1: Using filter

With filter you only get a list of numbers divisible by 3, and the original list remains unchanged. Then you can filter the source list using the opposite predicate, but this is an extra second pass. The code is as follows:

 let numbers = [1,2,3,4,5,6,7,8,9,10] let divisibleBy3 = numbers.filter { $0 % 3 == 0 } // [3,6,9] let theRest = numbers.filter { $0 % 3 != 0 } // [1,2,4,5,7,8,10] 

It is true that it is quite readable, but the fact that it makes 2 passes seems inefficient to me, especially if the predicate was more complex. This is twice as many checks that are really necessary.

Option 2: Using the user-defined separate function in Collection extension

My next attempt was to extend Collection and create a function that I called separate . This function will take a collection and go through the elements one at a time and add them to the correspondence list or non-conformance list. The code is as follows:

 extension Collection { func separate(predicate: (Generator.Element) -> Bool) -> (matching: [Generator.Element], notMatching: [Generator.Element]) { var groups: ([Generator.Element],[Generator.Element]) = ([],[]) for element in self { if predicate(element) { groups.0.append(element) } else { groups.1.append(element) } } return groups } } 

Then I can use a function like this:

 let numbers = [1,2,3,4,5,6,7,8,9,10] let groups = numbers.separate { $0 % 3 == 0 } let matching = groups.matching // [3,6,9] let notMatching = groups.notMatching // [1,2,4,5,7,8,10] 

This is also pretty clean, but the only thing I don't like is that I use the tuple as the return type. Others may not agree, but I would prefer to return the same type as self for the chain. But technically, you can simply capture either .matching or .notMatching , which is the same type as self , and you can associate them with any of them.

Option 3: using the custom removeIf function in the Array extension

My problem with separate returning a tuple made me try to create a function that would change self by deleting matches as they were searched and adding them to a new list and returning the list of matches at the end, The returned list is your matches and the array is truncated from these values. The order is stored in both arrays. The code is as follows:

 extension Array { mutating func removeIf(predicate: (Element) -> Bool) -> [Element] { var removedCount: Int = 0 var removed: [Element] = [] for (index,element) in self.enumerated() { if predicate(element) { removed.append(self.remove(at: index-removedCount)) removedCount += 1 } } return removed } } 

And it is used as follows:

 var numbers = [1,2,3,4,5,6,7,8,9,10] let divisibleBy3 = numbers.removeIf { $0 % 3 == 0 } // divisibleBy3: [3,6,9] // numbers: [1,2,4,5,7,8,10] 

This function should have been implemented in the Array extension, since the concept of deleting an element at a specific index does not apply to regular Collections (a Array is defined as public struct Array<Element> : RandomAccessCollection, MutableCollection , and it directly defines the remove(at:) function, and does not receive it from inheritance or protocol).

Option 4: Combination of Options 2 and 3

I am a big fan of code reuse, and after I came up with option 3, I realized that I could probably reuse the separate function from Option 2. I came up with the following:

 extension Array { mutating func removeIf(predicate: (Element) -> Bool) -> [Element] { let groups = self.separate(predicate: predicate) self = groups.notMatching return groups.matching } } 

And it is used in the same way as in Option 3.

I was interested in performance, so I ran each option through the XCTest measure with 1000 iterations. These were the results:

 Option 1: 9 ms Option 2: 7 ms Option 3: 10 ms Option 4: 8 ms 

Option 5: based on negaipro answer

I knew about partition , but I was not going to consider it, because it did not preserve the sort order. The answer is negaipro essentially a partition , but it got me thinking. The idea with partition is to replace elements that match with a pivot point, thereby ensuring that everything on one side of the end pivot point matches everything with the predicate and the other side does not. I accepted this idea and changed the action to "go to the end." Thus, matches are removed from their place and added to the end.

 extension Array { mutating func swapIfModified(predicate: (Element) -> Bool) -> Int { var matchCount = 0 var index = 0 while index < (count-matchCount) { if predicate(self[index]) { append(remove(at: index)) matchCount += 1 } else { index += 1 } } return count-matchCount } } 

In my initial tests using an array with 10 numbers, it was comparable to other parameters. But I was worried about the performance of the append(remove(at: index)) line append(remove(at: index)) . So I tried again all the options with arrays going from 1 to 1000, and this option was definitely the slowest.


Output:

There is no big difference in performance between these parameters. And since Option 4 was faster than Option 3 and reuses the code from Option 2, I tend to throw away Option 3. Therefore, I tend to use a simple old filter when I'm not interested in unfiltered results (and also when I don't care about filtered results, since it just uses the opposite predicate), and then using either separate or removeIf when I need to save both filtered and unfiltered results.

Question:

So, I missed something built into Swift that does this? Is there a better way to do this? Is my extension syntax the absence of anything (anything that could make it apply this concept to more areas, for example)?

+5
source share
3 answers
 let objects: [Int] = Array(1..<11) let split = objects.reduce(([Int](), [Int]())) { (value, object) -> ([Int], [Int]) in var value = value if object % 2 == 0 { value.1.append(object) } else { value.0.append(object) } return value } 
+2
source
 // swap and return pivot extension Array { // return pivot mutating func swapIf(predicate: (Element) -> Bool) -> Int { var pivot = 0 for i in 0..<self.count { if predicate( self[i] ) { if i > 0 { swap(&self[i], &self[pivot]) } pivot += 1 } } return pivot } } 

This is my code, and the concept is to reduce memory usage.

I checked that "swapIf" is 4 times faster than "removeIf".

+1
source

Solution A

For fewer items, this may be the fastest.

 extension Array { func stablePartition(by condition: (Element) -> Bool) -> ([Element], [Element]) { var matching = [Element]() var nonMatching = [Element]() for element in self { if condition(element) { matching.append(element) } else { nonMatching.append(element) } } return (matching, nonMatching) } } 

Using

 let numbers = [1,2,3,4,5,6,7,8,9,10] let (divisibleBy3, theRest) = numbers.stablePartition { $0 % 3 == 0 } print("divisible by 3: \(divisibleBy3), the rest: \(theRest)") // divisible by 3: [3, 6, 9], the rest: [1, 2, 4, 5, 7, 8, 10] 

Solution B

For many elements, this can be faster due to fewer distributions. I did not rate the performance.

 extension Array { public func stablePartition(by condition: (Element) throws -> Bool) rethrows -> ([Element], [Element]) { var indexes = Set<Int>() for (index, element) in self.enumerated() { if try condition(element) { indexes.insert(index) } } var matching = [Element]() matching.reserveCapacity(indexes.count) var nonMatching = [Element]() nonMatching.reserveCapacity(self.count - indexes.count) for (index, element) in self.enumerated() { if indexes.contains(index) { matching.append(element) } else { nonMatching.append(element) } } return (matching, nonMatching) } } 
+1
source

Source: https://habr.com/ru/post/1258110/


All Articles