Pandas Bar Graph Data Frame Index

Question

Pandas Bar Graph Data Frame Index

I have the following data frame (df) in pandas:

NetPrice Units Royalty Price 3.65 9.13 171 57.60 3.69 9.23 13 4.54 3.70 9.25 129 43.95 3.80 9.49 122 42.76 3.90 9.74 105 38.30 3.94 9.86 158 57.35 3.98 9.95 37 13.45 4.17 10.42 69 27.32 4.82 12.04 176 77.93 4.84 24.22 132 59.02 5.16 12.91 128 60.81 5.22 13.05 129 62.00

I am trying to create a bar chart by index ("Price") with the "Units" axis. I started with the following:

 plt.hist(df.index)

This gives me a histogram that shows the price. How to add units to the y axis? Now this is just a "scale."

Thanks!

+6

matplotlib pandas plot

Digital musicology Nov 26 '14 at 19:37

source share

1 answer

Jd long · Accepted Answer · 2014-11-26T20:50:31+0000

Since your data is already partially aggregated, you cannot use hist() methods directly. As @snorthway said in the comments, you can do this with a bar chart. First you need to put your data in buckets first. My favorite way to put data in buckets is with the pandas cut() method.

Let me customize some sample data, since you have not provided some of them that are convenient to use:

 np.random.seed(1) n = 1000 df = pd.DataFrame({'Price' : np.random.normal(5,2,size=n), 'Units' : np.random.randint(100, size=n)})

Put the prices in 10 evenly spaced buckets:

 df['bucket'] = pd.cut(df.Price, 10) print df.head() Price Units bucket 0 8.248691 98 (7.307, 8.71] 1 3.776487 8 (3.0999, 4.502] 2 3.943656 89 (3.0999, 4.502] 3 2.854063 27 (1.697, 3.0999] 4 6.730815 29 (5.905, 7.307]

So now we have a field containing a range of buckets. If you want to give these buckets different names, you can read about it in the excellent Pandas documentation . Now we can use the pandas groupby() and sum() method to add units:

 newdf = df[['bucket','Units']].groupby('bucket').sum() print newdf Units bucket (-1.122, 0.295] 492 (0.295, 1.697] 1663 (1.697, 3.0999] 5003 (3.0999, 4.502] 11084 (4.502, 5.905] 15144 (5.905, 7.307] 11053 (7.307, 8.71] 4424 (8.71, 10.112] 1008 (10.112, 11.515] 77 (11.515, 12.917] 122

It looks like a winner ... now let him talk:

  newdf.plot(kind='bar')

Pandas Bar Graph Data Frame Index

More articles: