Pandas data manipulation and construction

Question

Pandas data manipulation and construction

Using WinPython 3.4, matplotlib 1.3.1, I pull the data for the data frame from the mysql database. The raw data frame that I get from the request looks like this:

            wafer_number test_type  test_pass  x_coord  y_coord  test_el_id wavelength intensity
        0       HT2731      T2          1       38       54          24      288.68   4413
        1       HT2731      T2          1       40       54          25      257.42   2595
        2       HT2731      T2          1       50       54          28      300.00   2836
        3       HT2731      T2          1       52       54          29      300.00   2862
        4       HT2731      T2          1       54       54          30      300.00   3145
        5       HT2731      T2          1       56       54          31      300.00   2804
        6       HT2731      T2          1       58       54          32      255.69   2803
        7       HT2731      T2          1       59       54          33      257.23   2991
        8       HT2731      T2          1       60       54          34      262.45   3946
        9       HT2731      T2          1       62       54          35      291.84   9398
        10      HT2801      T2          1       38       55          54      288.68   4125
        11      HT2801      T2          1       38       56          55      265.25   4258

I need to build the wavelength and intensity along the x and y axes, respectively, with each different number of plates, as well as my own series. I need to save the x_coord and y_coord variables so that I can better identify the outstanding data points by clicking on them and adding them to the list. I will get this job after I earn these things.

I thought that using the built-in data frame building capabilities required me to execute the pivot_table method

wl_vs_int = results.pivot_table(values='intensity', rows=['x_coord', 'y_coord','wavelength'], cols='wafer_number')

on my data framework, which then turns the data block into:

        wafer_number    HT2478  HT2625  HT2644  HT2671  HT2673  HT2719  HT2731  HT2796  HT2801
 x_coord  y_coord   wavelength                                  
    27      35  289.07   NaN     NaN     NaN     5137    NaN     NaN     NaN     NaN     NaN
            36  250.88   4585    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
            37  260.90   NaN     NaN     NaN     NaN     4270    NaN     NaN     NaN     NaN
            38  288.87   NaN     NaN     NaN     8191    NaN     NaN     NaN     NaN     NaN
            40  259.74   NaN     NaN     NaN     NaN     17027   NaN     NaN     NaN     NaN
            41  259.74   NaN     NaN     NaN     NaN     18742   NaN     NaN     NaN     NaN
            42  259.74   NaN     NaN     NaN     NaN     34098   NaN     NaN     NaN     NaN
    28      34  268.27   NaN     NaN     NaN     NaN     2080    NaN     NaN     NaN     NaN
            38  257.42   7727    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
            44  260.13   NaN     NaN     NaN     NaN     55329   NaN     NaN     NaN     NaN

x, y , , wl vs,

plt.scatter(wl_vs_int.wavelength, wl_vs_int.columns)

AttributeError:

AttributeError: 'DataFrame' object has no attribute 'wavelength'

dataframe , , DataFrame .

, , ( ). python pandas, , - - . . .

+4

python matplotlib pandas plot dataframe

zeppelin_d 08 '14 16:44

1

Guillaume Jacquenot · Accepted Answer · 2014-05-08T17:49:46+0000

x y , , data wrt wafer_number,

import pandas as pd
from StringIO import StringIO
import matplotlib.pyplot as plt

data = \
"""wafer_number,test_type,test_pass,x_coord,y_coord,test_el_id,wavelength,intensity
HT2731,T2,1,38,54,24,288.68,4413
HT2731,T2,1,40,54,25,257.42,2595
HT2731,T2,1,50,54,28,300.00,2836
HT2731,T2,1,52,54,29,300.00,2862
HT2731,T2,1,54,54,30,300.00,3145
HT2731,T2,1,56,54,31,300.00,2804
HT2731,T2,1,58,54,32,255.69,2803
HT2731,T2,1,59,54,33,257.23,2991
HT2731,T2,1,60,54,34,262.45,3946
HT2731,T2,1,62,54,35,291.84,9398
HT2801,T2,1,38,55,54,288.68,4125
HT2801,T2,1,38,56,55,265.25,4258"""

df = pd.read_csv(StringIO(data),sep = ',')
dfg = df.groupby('wafer_number')

colors = 'bgrcmyk'
fig, ax = plt.subplots()
for i,k in enumerate(dfg.groups.keys()):
    currentGroup = df.loc[dfg.groups[k]]
    color = colors[i % len(colors)]
    ax.plot(currentGroup['wavelength'].values,currentGroup['intensity'].values,\
            ls='', color = color, label = k, marker = 'o', markersize = 8)
legend = ax.legend(loc='upper center', shadow=True)
plt.xlabel('wavelength')
plt.ylabel('intensity')
plt.show()

Pandas data manipulation and construction

More articles: