Since your data is already partially aggregated, you cannot use hist()
methods directly. As @snorthway said in the comments, you can do this with a bar chart. First you need to put your data in buckets first. My favorite way to put data in buckets is with the pandas cut()
method.
Let me customize some sample data, since you have not provided some of them that are convenient to use:
np.random.seed(1) n = 1000 df = pd.DataFrame({'Price' : np.random.normal(5,2,size=n), 'Units' : np.random.randint(100, size=n)})
Put the prices in 10 evenly spaced buckets:
df['bucket'] = pd.cut(df.Price, 10) print df.head() Price Units bucket 0 8.248691 98 (7.307, 8.71] 1 3.776487 8 (3.0999, 4.502] 2 3.943656 89 (3.0999, 4.502] 3 2.854063 27 (1.697, 3.0999] 4 6.730815 29 (5.905, 7.307]
So now we have a field containing a range of buckets. If you want to give these buckets different names, you can read about it in the excellent Pandas documentation . Now we can use the pandas groupby()
and sum()
method to add units:
newdf = df[['bucket','Units']].groupby('bucket').sum() print newdf Units bucket (-1.122, 0.295] 492 (0.295, 1.697] 1663 (1.697, 3.0999] 5003 (3.0999, 4.502] 11084 (4.502, 5.905] 15144 (5.905, 7.307] 11053 (7.307, 8.71] 4424 (8.71, 10.112] 1008 (10.112, 11.515] 77 (11.515, 12.917] 122
It looks like a winner ... now let him talk:
newdf.plot(kind='bar')
