Merge identical adjacent rows in a Pandas series

Basically, if the column of my pandas frame looks like this:

[1 1 1 2 2 2 3 3 3 1 1]

I would like it to be included in the following:

[1 2 3 1]
+4
source share
4 answers

You can write a simple function that traverses the elements of your series, keeping only the first element in the run.

As far as I know, for pandas there is no tool built into pandas. But it is not so much code to do it yourself.

import pandas
example_series = pandas.Series([1, 1, 1, 2, 2, 3])

def collapse(series):
    last = ""
    seen = []
    for element in series:
        if element != last:
            last = element
            seen.append(element)
    return seen

collapse(example_series)

In the above code, you will go through each element of the series and check if it matches the last element seen. If it is not, save it. If so, ignore the value.

, :

return pandas.Series(seen)
+1

, :

x = pandas.Series([1 1 1 2 2 2 3 3 3 1 1])
y = x-x.shift(1)
y[0] = 1
result = x[y!=0]
+1

You can use DataFrame diff and indexing:

>>> df = pd.DataFrame([1,1,2,2,2,2,3,3,3,3,1])
>>> df[df[0].diff()!=0]
    0
0   1
2   2
6   3
10  1
>>> df[df[0].diff()!=0].values.ravel() # If you need an array
array([1, 2, 3, 1])

The same thing works for the series:

>>> df = pd.Series([1,1,2,2,2,2,3,3,3,3,1])
>>> df[df.diff()!=0].values
array([1, 2, 3, 1])
+1
source

You can use shiftto create a boolean mask to compare a string with a previous string:

In [67]:
s = pd.Series([1,1,2,2,2,2,3,3,3,3,4,4,5])
s[s!=s.shift()]

Out[67]:
0     1
2     2
6     3
10    4
12    5
dtype: int64
0
source

Source: https://habr.com/ru/post/1648330/


All Articles