Efficient set intersection to get rows in a DataFrame

I have dataframe in 3 levels, the relevant issue, :ID, :Position, :Probability. Each line is unique, but several lines can have the same one ID. What I would like to do is get all the rows for a particular value Positionthat are shared IDwith any row that is Probabilityabove a certain value in a different position.

For example, let's say I have the following DataFrame (df):

1020692ร—8 DataFrames.DataFrame
โ”‚ Row     โ”‚ ID  โ”‚ Position      โ”‚ Probability โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1       โ”‚ 425 โ”‚ "first"       โ”‚ 0.02        โ”‚
โ”‚ 2       โ”‚ 425 โ”‚ "last"        โ”‚ 0.03        โ”‚
โ”‚ 3       โ”‚ 425 โ”‚ "penultimate" โ”‚ 0.02        โ”‚
โ”‚ 4       โ”‚ 425 โ”‚ "other"       โ”‚ 0.04        โ”‚
โ”‚ 5       โ”‚ 421 โ”‚ "first"       โ”‚ 0.44        โ”‚
โ”‚ 6       โ”‚ 421 โ”‚ "last"        โ”‚ 0.85        โ”‚
โ”‚ 7       โ”‚ 421 โ”‚ "second"      โ”‚ 0.59        โ”‚
โ”‚ 8       โ”‚ 421 โ”‚ "other"       โ”‚ 1.0         โ”‚
โ‹ฎ

If I set the threshold 0.8, I want to end all the lines where :Position == "first", if :IDhas :Position == "last" && :Probability > 0.8. In other words, I need line 5 because line 6 has :Probability > 0.8, but not line 1, since line 2 does not work.

, . , :Position == "first" "last" , .

, ID last Probability > 0.8, in(). ...

firsts = df[df[:Position] .== "first", :]
lasts = df[df[:Position] .== "last", :]
meetsthreshold = lasts[lasts[:Probability] .> 0.8, :ID]

final = firsts[[in(i, meetsthreshold) for i in firsts[:ID]], :]

ID, , ( length(meetsthreshold) > 100k). , , , , ID s (, intersect(Set(firsts[:ID]), Set(meetsthreshold))), . dataframe, ?

+3
1

- . :

firsts = df[df[:Position] .== "first", :]
lasts = df[df[:Position] .== "last", :]
meetsthreshold = Set(lasts[lasts[:Probability] .> 0.8, :ID])

final = firsts[Vector{Bool}([in(i, meetsthreshold) for i in firsts[:ID]]), :]

Ran 1 .

+2

Source: https://habr.com/ru/post/1648573/


All Articles