Combining two array data using an internal join

I have two datasets in an array:

arr1 = [ ['2011-10-10', 1, 1], ['2007-08-09', 5, 3], ... ] arr2 = [ ['2011-10-10', 3, 4], ['2007-09-05', 1, 1], ... ] 

I want to combine them into one array as follows:

 arr3 = [ ['2011-10-10', 1, 1, 3, 4], ... ] 

I mean, just combine these rows with the same date column.

EDIT

Thanks to everyone, just for clarification, I do not need those lines that do not appear in both arrays, just drop them.

+6
source share
6 answers

Organize your data in different ways (you can easily convert what you already have into two dict s):

 d1 = { '2011-10-10': [1, 1], '2007-08-09': [5, 3] } d2 = { '2011-10-10': [3, 4], '2007-09-05': [1, 1] } 

Then:

 d3 = { k : d1[k] + d2[k] for k in d1 if k in d2 } 
+5
source

You can convert arrays to dict and vice versa.

 d1 = dict((x[0],x[1:]) for x in arr1) d2 = dict((x[0],x[1:]) for x in arr2) keys = set(d1).union(d2) n = [] result = dict((k, d1.get(k, n) + d2.get(k, n)) for k in keys) 
+2
source

It might be worth mentioning a set of data types. as their methods correspond to the type of problem. The set statements allow you to easily and flexibly combine sets with full, internal, external, left, and right joins. As in dictionaries, sets do not preserve order, but if you add the set back to the list, you can apply the order in combining the results. Alternatively, you can use the o modified dictionary .

 set1 = set(x[0] for x in arr1) set2 = set(x[0] for x in arr2) resultset = (set1 & set2) 

This will combine the dates in the original lists, to restore arr3 you will need to add the data [1:] to arr1 and arr2, where the dates are indicated in the result set. This reconstruction would not be as neat as using the vocabulary solutions above, but using sets is worth considering for such problems.

+2
source

The only dictionary:

 tmp = {} # add as many as you like into the outermost array. for outer in [arr1,arr2]: for inner in outer: start, rest = inner[0], inner[1:] # the list if key exists, else create a new list. Append to the result tmp[start] = tmp.get(start,[]) + rest output = [] for k,v in tmp.iteritems(): output.append([k] + v) 

This will be the equivalent of a full outer join (returns data from both sides, even if one side is zero). If you want to get an inner join, you can change it to the following:

 tmp = {} keys_with_dupes = [] for outer in [arr1,arr2]: for inner in outer: start, rest = inner[0], inner[1:] original = tmp.get(start,[]) tmp[start] = original + rest if original: keys_with_dupes.append(start) output = [] for k in keys_with_dupes: v = tmp[k] output.append([k] + v) 
+1
source

Generator function approach that skips the corresponding elements whose dates do not match:

 import itertools def gen(a1, a2): for x,y in itertools.izip(a1, a2): if x[0] == y[0]: ret = list(x) ret.extend(y[1:]) yield ret else: continue >>print list(gen(arr1, arr2)) [['2011-10-10', 1, 1, 3, 4]] 

But yes, if possible, organize your data in different ways.

+1
source

If both of them are very large, I would use a dictionary:

 arr1 = [ ['2011-10-10', 1, 1], ['2007-08-09', 5, 3] ] arr2 = [ ['2011-10-10', 3, 4], ['2007-09-05', 1, 1] ] table_1 = dict((tup[0], tup[1:]) for tup in arr1) table_2 = dict((tup[0], tup[1:]) for tup in arr2) merged = {} for key, value in table_1.items(): other = table_2.get(key) if other: merged[key] = value + other 

Otherwise, it would be more convenient to sort each one and then merge in this way. But I assume that for most purposes this will be fast enough.

0
source

Source: https://habr.com/ru/post/949577/


All Articles