How to filter rows based on order related constraint?

I have the following framework:

df = 
    ID   TYPE   VD_0   VD_1   VD_2   VD_3
    1    ABC    V1234  456    123    564
    2    DBC    456    A45    123    564
    3    ABD    456    V1234  456    123
    4    ABD    123    V1234  SSW    123

Below is a list of values VD_0, VD_1, VD_2and VD_3:

myList = [V1234,456,A45]

I want to get only those rows in dfwhich have 2 sequential events of the values myListin the columns VD_0, VD_1, VD_2and VD_3.

The result is the following:

result = 
    ID   TYPE   VD_0   VD_1   VD_2   VD_3
    1    ABC    V1234  456    123    564
    2    DBC    456    A45    123    564
    3    ABD    456    V1234  456    123

For example, in a line with ID1, the values ​​of VD_0and VD_1are equal V1234and 456, accordingly, both of these values ​​belong myList. The same logic applies to rows with ID2 ( 456, A45) and 3 ( 456, V1234).

How can i do this?

+4
2

MaxU, IIUC. , , 2 . , , , , , 2, . 2- = 1. , , 2:

subset = df.filter(like='VD_')

df[subset.isin(myList).rolling(2, axis=1).sum().max(axis=1)>=2]
Out[26]: 
   ID TYPE   VD_0   VD_1 VD_2  VD_3
0   1  ABC  V1234    456  123   564
1   2  DBC    456    A45  123   564
2   3  ABD    456  V1234  456   123
+2

:

In [112]: subset = df.filter(like='VD_')

In [113]: df[subset[subset.isin(myList)].stack().duplicated().unstack().any(1)]
Out[113]:
   ID TYPE VD_0   VD_1 VD_2  VD_3
1   2  DBC  456    A45  123   564
2   3  ABD  456  V1234  456   123
3   4  ABD  123  V1234  SSW   123

:

In [114]: subset
Out[114]:
    VD_0   VD_1 VD_2  VD_3
0  V1234    456  123   564
1    456    A45  123   564
2    456  V1234  456   123
3    123  V1234  SSW   123

In [115]: subset.isin(myList)
Out[115]:
    VD_0  VD_1   VD_2   VD_3
0   True  True  False  False
1   True  True  False  False
2   True  True   True  False
3  False  True  False  False

In [116]: subset[subset.isin(myList)]
Out[116]:
    VD_0   VD_1 VD_2  VD_3
0  V1234    456  NaN   NaN
1    456    A45  NaN   NaN
2    456  V1234  456   NaN
3    NaN  V1234  NaN   NaN

In [118]: subset[subset.isin(myList)].stack()
Out[118]:
0  VD_0    V1234
   VD_1      456
1  VD_0      456
   VD_1      A45
2  VD_0      456
   VD_1    V1234
   VD_2      456
3  VD_1    V1234
dtype: object

In [119]: subset[subset.isin(myList)].stack().duplicated()
Out[119]:
0  VD_0    False
   VD_1    False
1  VD_0     True
   VD_1    False
2  VD_0     True
   VD_1     True
   VD_2     True
3  VD_1     True
dtype: bool

In [120]: subset[subset.isin(myList)].stack().duplicated().unstack()
Out[120]:
    VD_0   VD_1  VD_2
0  False  False  None
1   True  False  None
2   True   True  True
3   None   True  None

In [121]: subset[subset.isin(myList)].stack().duplicated().unstack().any(1)
Out[121]:
0    False
1     True
2     True
3     True
dtype: bool
+1

Source: https://habr.com/ru/post/1669625/


All Articles