Replication SQL 'Join' in Python

Question

Replication SQL 'Join' in Python

I am trying to switch from R to Python (mostly problems with general flexibility). With Numpy, matplotlib, and ipython, I can cover all of my use cases, with the exception of merging "datasets." I would like to simulate SQL join by clause (internal, external, full) exclusively in python. R handles this with the "merge" function.

I tried numpy.lib.recfunctions join_by, but these are critical problems with duplicates along the "key":

join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2',
        defaults=None, usemask=True, asrecarray=False)

Connect arrays r1and r2to a key key.

The key must be either a string or a string corresponding to the fields that are used to connect to the array. An exception occurs if the field keycannot be found on the two inputs of Arrays.

Neither should r1nor r2should have duplicates along key: the presence of duplicates will make the conclusion rather unreliable. Note that duplicates are not looking for an algorithm.

source: http://presbrey.mit.edu:1234/numpy.lib.recfunctions.html

Any pointers or help would be most appreciated!

+3

python numpy

danielbmathews Jun 06 '10 at 5:58

source share

2 answers

, ...

pandas - . " " "".

+2

danielbmathews 10 . '13 21:52

Alex Martelli · Accepted Answer · 2010-06-06T06:26:14+0000

, SQL Python dicts, dicts, ( ) ( , , numpy, ). ( ) - , on ( , "" [[dict] ] , ), ( , , ):

def inner_join(tab1, tab2, prefix1, prefix2, on):
  for r1 in tab1:
    for r2 in tab2:
      if on(r1, r2):
        row = dict((prefix1 + k1, v1) for k1, v1 in r1.items())
        row.update((prefix2 + k2, v2) for k2, v2 in r2.items())
        yield row

, , , O(M * N) - , , ( " SQL join by clause (, , )", ) , on a JOIN .

, , [[ ]] , - bool, reset to yielded = False for r2, True, yield, if not yielded: (, None NULL v2, r2).

, , on - , unique , , , , , .

Replication SQL 'Join' in Python

More articles: