Python data - pickup based on a list of values ​​or conditions

I have a dataset that has 9 columns, and I managed to extract two columns using pandas (Thanks for the Stack members for your help before!). Now, my question is: I have a list of values ​​that will be used to collect data from a data set and retrieve the corresponding values. The selected dataset looks like this:

Exp. m/z    Intensity
1000        2000
2000        3000
3000        4000
4000        5000

etc. (for each data set there are about 500 rows). The list used for pickup is as follows:

mass
1200
1300

etc. (about 200 lines for the pickup list). Each mass value will be used to calculate the upper and lower bins, and they will be used to collect experience. m / z from the dataset. So, for example, the mass of 1200 will be calculated as 1250 (upper) and 1150 (lower), and everything that falls into this range from the data set will be raised, and the corresponding intensity values ​​are what I want. If none is selected, I want the result to be an empty value, if possible, because, as I believe, the values ​​0 affect the average and other statistical analysis.

Below is my code where the file is a dataset and pickupfile is a list of pickups:

from pandas import DataFrame

    import pandas as pd
    import numpy as np

    file = 'C09.xls'
    pickupfile = 'pickuplist.xlsx'

    xl = pd.ExcelFile(file)
    pl = pd.ExcelFile(pickupfile)

    plist = pd.read_excel(xl)
    pickuplist = pd.read_excel(pl)

    cmass = plist['Exp. m/z']
    height = plist['Intensity']


    plistcollect = pd.concat([cmass, height], axis=1)


    ppm = 150

    peak1upper = round(pickuplist*(1+ppm/1000000),4)

    peak1lower = round(pickuplist*(1-ppm/1000000),4)

    pickup = plistcollect[((plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper))]
    print(pickup['Intensity'])

, : ValueError: , float64. , - , , , /?

!

EDIT: , (1lower peak1upper) float64.

isin :

    pickup = plistcollect[plistcollect.isin(np.arange(peak1lower,peak1upper))]
0
2

DataFrame pickuplist , plist

matches = pd.DataFrame(index=pickup['mass'], columns = plist.set_index(list(plist.columns)).index, dtype=bool)

DataFrame , , , 150 ppm , abs,

ppm = 150
for index, exp_mass, intensity in plist.itertuples():
    matches[exp_mass] = abs(matches.index - exp_mass) / matches.index < ppm / 1e6

-

Exp. m/z    1000    2000    3000    4000
Intensity   2000    3000    4000    5000
mass                
1000    True    False   False   False
1200    False   False   False   False
1300    False   False   False   False

dict

results = {i: list(s.index[s]) for i, s in matches.iterrows()}

dict pickuplist plist (Exp. m/z, Intensity),

{1000: [(1000, 2000)], 1200: [], 1300: []}

(Exp. m/z, Intensity),

results2 = {key for key, value in matches.any().iteritems() if value}

set

{(1000, 2000)}
+1

, .

plistcollect[(plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper)]

plistcollect[((plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper))]

: , - :

limit_df = pd.DataFrame([peak1lower['Exp. m/z'],peak1upper['Exp. m/z']], index=['lower','upper']).T
filtered_df = limit_df.apply(lambda x: ((plistcollect['Exp. m/z'] > x.lower) & (plistcollect['Exp. m/z'] < x.upper)), axis=1)

filter_df DataFrame, True, False, DataFrame, .

:

def filter_df(x):
    plistcollect[((plistcollect['Exp. m/z'] > x.lower) & (plistcollect['Exp. m/z'] < x.upper))].to_csv("test_%s.csv"%x.name)

limit_df.apply(lambda x: filter_df(x), axis=1)
+1

Source: https://habr.com/ru/post/1679422/


All Articles