I have dataframe in 3 levels, the relevant issue, :ID, :Position, :Probability. Each line is unique, but several lines can have the same one ID. What I would like to do is get all the rows for a particular value Positionthat are shared IDwith any row that is Probabilityabove a certain value in a different position.
For example, let's say I have the following DataFrame (df):
1020692ร8 DataFrames.DataFrame
โ Row โ ID โ Position โ Probability โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโค
โ 1 โ 425 โ "first" โ 0.02 โ
โ 2 โ 425 โ "last" โ 0.03 โ
โ 3 โ 425 โ "penultimate" โ 0.02 โ
โ 4 โ 425 โ "other" โ 0.04 โ
โ 5 โ 421 โ "first" โ 0.44 โ
โ 6 โ 421 โ "last" โ 0.85 โ
โ 7 โ 421 โ "second" โ 0.59 โ
โ 8 โ 421 โ "other" โ 1.0 โ
โฎ
If I set the threshold 0.8, I want to end all the lines where :Position == "first", if :IDhas :Position == "last" && :Probability > 0.8. In other words, I need line 5 because line 6 has :Probability > 0.8, but not line 1, since line 2 does not work.
, . , :Position == "first" "last" , .
, ID last Probability > 0.8, in(). ...
firsts = df[df[:Position] .== "first", :]
lasts = df[df[:Position] .== "last", :]
meetsthreshold = lasts[lasts[:Probability] .> 0.8, :ID]
final = firsts[[in(i, meetsthreshold) for i in firsts[:ID]], :]
ID, , ( length(meetsthreshold) > 100k). , , , , ID s (, intersect(Set(firsts[:ID]), Set(meetsthreshold))), . dataframe, ?