I have a multi-index data table. The first multi-level - this is the name corresponding to the sequence (DNA), a second multi-level corresponds to a specific type of sequence variant wt
, m1
, m2
, m3
in the example below. Not all given sequences wt
will have all types of options (see seqA
and seqC
below).
df = pd.DataFrame(data={'A':range(1,9), 'B':range(1,9), 'C': range(1,9)},
index=pd.MultiIndex.from_tuples([('seqA', 'wt'), ('seqA', 'm1'),
('seqA', 'm2'), ('seqB', 'wt'), ('seqB', 'm1'), ('seqB', 'm2'),
('seqB', 'm3'), ('seqC', 'wt') ]))
df.index.rename(['seq_name','type'], inplace=True)
print df
A B C
seq_name type
seqA wt 1 1 1
m1 2 2 2
m2 3 3 3
seqB wt 4 4 4
m1 5 5 5
m2 6 6 6
m3 7 7 7
seqC wt 8 8 8
, (m1
m2
). , , seq_name
, list
.
, IMO.
var_l = ['wt', 'm1', 'm2']
df1 = df[df.index.get_level_values('type').isin(var_l)]
set_l = []
for v in var_l:
df2 = df[df.index.get_level_values('type').isin([v])]
set_l.append(set(df2.index.get_level_values('seq_name')))
seq_s = set.intersection(*set_l)
df3 = df1[df1.index.get_level_values('seq_name').isin(seq_s)]
print df3
A B C
seq_name type
seqA wt 1 1 1
m1 2 2 2
m2 3 3 3
seqB wt 4 4 4
m1 5 5 5
m2 6 6 6
, , . - :
var_l = ['wt', 'm1', 'm2']
filtered_df = filterDataframe(df1, var_l)
print filtered_df
A B C
seq_name type
seqA wt 1 1 1
m1 2 2 2
m2 3 3 3
seqB wt 4 4 4
m1 5 5 5
m2 6 6 6
, any .