Object indices in pandas: performance? Well supported?

"I found," that I can create pandas.Index, using Python objects, and everything looks fine, as long as the objects are implemented: __hash__, __eq__, __ne__, __str__. Is there any success for this? For example. will sort and select work as fast as if i were using strings or integer indices? How well is this indicator supported? Is there any documentation on how to do this correctly?

Here is an example:

class MyObject(object):
  def __init__(self, name):
    self.name = name  # Expect name is a string
    self.complicated_object = lambda x: 2 * x

  def __hash__(self):
    # Allows indexing frames by name rather than question
    return hash(self.name)

  def __str__(self):
    # Makes sure DataFrames print nicely
    return self.name

  def __eq__(self, other):
    # Allows indexing frames by name rather than question
    if isinstance(other, basestring):
      return self.name == other
    else:
      return self.name == other.name

my_series = pd.Series([1, 2], index=[MyObject('cat'), MyObject('dog')])

print my_series

my_series.index[0]

Prints cat 1 dog 2 dtype: int64 <__main__.MyObject at 0x81a67d0>

+4
source share
1 answer

In short: yes, there will be a performance hit for sorting. Here's a test case:

n = 10000
idx = np.random.permutation(n)
data = np.arange(n)
obj_idx = [MyObject(str(ii)) for ii in idx]
str_idx = [str(ii) for ii in idx]
int_idx = idx.tolist()

s1 = pd.Series(data, obj_idx)
s2 = pd.Series(data, str_idx)
s3 = pd.Series(data, int_idx)

Sort time:

In [1]: %%timeit s = s1.copy()
s.sort_index()
   ....: 
10 loops, best of 3: 47.6 ms per loop

In [2]: %%timeit s = s2.copy()
s.sort_index()
   ....: 
100 loops, best of 3: 6.63 ms per loop

In [3]: %%timeit s = s3.copy()
s.sort_index()
   ....: 
1000 loops, best of 3: 794 ยตs per loop
+2
source

Source: https://habr.com/ru/post/1613988/


All Articles