I have network traffic data in the following for each hour of a ten-day period, as indicated in dataset R.
Day Hour Volume Category
0 00 100 P2P
0 00 50 email
0 00 200 gaming
0 00 200 video
0 00 150 web
0 00 120 P2P
0 00 180 web
0 00 80 email
....
0 01 150 P2P
0 01 200 P2P
0 01 50 Web
...
...
10 23 100 web
10 23 200 email
10 23 300 gaming
10 23 300 gaming
As you can see, the repetition of the category occurs within one hour. I need to calculate volatility and peak hour to average hourly ratios for these different categories of applications.
Volatility : standard deviation of hourly volumes divided by hourly average.
Peak hour until wednesday. hour : the ratio of the volume of the maximum hour to volume. from the average hour for this application.
So, how do I aggregate and calculate these two statistics for each category? I am new to R and don't know how to aggregate and get averages as indicated.
, , 24- , ,
Category Volatility Peak to Avg. Ratio
Web 0.55 1.5
P2P 0.30 2.1
email 0.6 1.7
gaming 0.4 2.9
: plyr .
stats = ddply(
.data = my_data
, .variables = .( Hour , Category)
, .fun = function(x){
to_return = data.frame(
volatility = sd((x$Volume)/mean(x$Volume))
, pa_ratio = max(x$Volume)/mean(x$Volume)
)
return( to_return )
}
)
, . , 24 , , PA. ?