An idiomatic way to select specific rows from a data frame (whose index exists in other data frames)

So, I have two pandas timers, and the indices on both are timestamps. The fact is that not all timestamps exist on both time servers. I want to perform linear regression on matching points, ignoring those that don't have a “pair”

This is my current solution, but it seems somewhat detailed and ugly:

indexes_used = sorted(list(set(series1).intersection(series2)))

perform_regression(series1.loc[indexes_used], series2.loc[indexes_used])

As an alternative, I was thinking of doing (but creating a temporary data frame seems redundant):

temp_frame = pd.concat([series1, series2]).T.dropna() #need the transpose to keep timestamps on vertical axis

perform_regression(blabla)

Is there a good way to do this?

+4
source share
1 answer

How about Series.align:

import pandas as pd
a = pd.Series([4, 5, 6, 7], index=[1, 2, 3, 4])
b = pd.Series([49, 54, 62, 74], index=[2, 6, 4, 0])

a2, b2 = a.align(b, join="inner")

conclusion:

2    5
4    7
dtype: int64

2    49
4    62
dtype: int64
+4
source

Source: https://habr.com/ru/post/1544251/


All Articles