Objective-C Ruby implementation of "chunk"

Question

Objective-C Ruby implementation of "chunk"

I have an Objective-C application where I try to sort an NSArray when grouping array elements that have equal sort values. Ideally, I would generate a new array of sets, where each set in the new array contains one or more of the original elements of the array, and all elements in each set have the same sort values. It will work similar to the Ruby "chunk" method

To give an example, imagine I had an NSArray containing elements whose sort values are equivalent to the following:

[1, 3, 5, 7, 9, 8, 5, 3, 2, 4, 3, 6]

I would like the new array to contain 9 sets with sort values that look like this:

 [ (1), (2), (3, 3, 3), (4), (5, 5), (6), (7), (8), (9) ]

In Ruby, I could sort the array first and then add it to get what I want. I am trying to come up with a reasonably efficient way to do this in Objective-C.

I could create a dictionary containing every possible sort value, as a key with NSSet as a value for each key. Then I could go through the starting array, calculating the sort value for each element, finding the corresponding key for that sort value and updating its set when I go. I could finally sort the contents of this dictionary to get a list of sorted sets.

I could do all this, but there seems to be a better way that I am missing. Also, the values I'm sorting can be floating point values, so using them as keys in a dictionary is likely to have a limited meaning.

Can anyone think of a smarter way of doing this? Did I miss something obvious here?

+4

sorting objective-c nsarray nsdictionary nsset

Tim dean Sep 08 '13 at 0:41

source share

2 answers

Why not use one NSCountedSet to store all the keys and count each of them?

 NSArray *sourceArray = @[ @1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6 ]; NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:sourceArray]; NSArray* sortedKeys = [[countedSet allObjects] sortedArrayUsingSelector:@selector(compare:)]; for (NSNumber *key in sortedKeys) { NSUInteger count = [countedSet countForObject:key]; NSLog(@"Key: %@ count: %ld", key, (unsigned long)count); }

+1

Kurt revis Sep 08 '13 at 1:09

source share

Itai ferber · Accepted Answer · 2013-09-08T01:24:51+0000

If you just need the number of times that objects arise, then Kurt's answer is pretty good. If you really need a snippet, this should work:

 NSArray *original = @[@1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6]; NSMutableArray *chunked = [NSMutableArray array]; NSNumber *current = nil; for (NSNumber *number in [original sortedArrayUsingSelector:@selector(compare:)]) { if (![number isEqual:current]) { [chunked addObject:[NSMutableArray arrayWithObject:number]]; current = number; } else { [[chunked lastObject] addObject:number]; } } NSLog(@"%@", chunked);

If I missed something, it is not computationally difficult, but should be a little more efficient than Tim's original method (no dictionaries, sets or hashing needed). There is one view involved (with a quick enumeration, the container - the part after in - is evaluated only once), and you iterate over the sorted array once. NSMutableArray O(1) insertion at both ends, so the worst case should be O(n) due to iteration.

Actually: in a further review, the following code works much faster for large sets of numbers. It is a bit more confusing, but works more efficiently.

 NSArray *original = @[@1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6]; NSMutableArray *chunked = [NSMutableArray array]; NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:original]; for (NSNumber *number in countedSet) { NSMutableArray *chunk = [NSMutableArray array]; NSUInteger count = [set countForObject:number]; for (NSUInteger i = 0; i < count; i++) { [chunk addObject:number]; } [chunked addObject:chunk]; } [chunked sortUsingComparator:^(NSArray *a1, NSArray *a2) { return [a1[0] compare:a2[0]]; }]; NSLog(@"%@", chunked);

With 10000000 random numbers, the first implementation takes about 12.27 seconds, and the second takes 0.92 seconds. Hover over your mouse.

The second method has the disadvantage that the pieces that it creates are all duplicates of the same object; if this presents problems for you (in the general case, it may be problematic for memory management, or if your objects can be considered "equal" in a sense, even if all their properties are not quite right), use the first method. Otherwise, it will be better for you.

Additional clarification: upon further thought, I knew that during the difference between the two methods there was something suspicious, and I was right. If you have many variations in your dataset (with very few duplicate numbers), method 2 will work far, much slower; changing the number does not greatly affect Method 1. For many repeated numbers, Method 2 will be pretty fast, but if your data set is completely random, you would be better off using Method 1.

Here is the code I use to test these two: http://pastebin.com/9syEyiyM

Objective-C Ruby implementation of "chunk"

More articles: