Using R to get volatility and peak for environments. Internet traffic data ratio

I have network traffic data in the following for each hour of a ten-day period, as indicated in dataset R.

   Day   Hour         Volume          Category
    0    00            100            P2P
    0    00            50             email
    0    00            200            gaming
    0    00            200            video
    0    00            150            web
    0    00            120            P2P
    0    00            180            web
    0    00            80             email
    ....
    0    01            150            P2P
    0    01            200            P2P
    0    01             50            Web
    ...
    ...
    10   23            100            web
    10   23            200            email
    10   23            300            gaming
    10   23            300            gaming

As you can see, the repetition of the category occurs within one hour. I need to calculate volatility and peak hour to average hourly ratios for these different categories of applications.

Volatility : standard deviation of hourly volumes divided by hourly average.

Peak hour until wednesday. hour : the ratio of the volume of the maximum hour to volume. from the average hour for this application.

So, how do I aggregate and calculate these two statistics for each category? I am new to R and don't know how to aggregate and get averages as indicated.

, , 24- , ,

Category    Volatility      Peak to Avg. Ratio
Web            0.55            1.5
P2P            0.30            2.1
email          0.6             1.7
gaming         0.4             2.9

: plyr .

stats = ddply(
    .data = my_data
    , .variables = .( Hour , Category)
    , .fun = function(x){
        to_return = data.frame(
            volatility = sd((x$Volume)/mean(x$Volume))
            , pa_ratio = max(x$Volume)/mean(x$Volume)
        )
        return( to_return )
    }
)

, . , 24 , , PA. ?

+3
1

( plyr): -, , Day-Hour, , , :

df1 <- ddply( df, .(Hour, Category), summarise, Volume = sum(Volume))

:

> ddply(df1, .(Category), summarise,
+            Volatility = sd(Volume)/mean(Volume),
+            PeakToAvg = max(Volume)/mean(Volume) )

  Category Volatility PeakToAvg
1      P2P  0.3225399  1.228070
2      Web         NA  1.000000
3    email  0.2999847  1.212121
4   gaming  0.7071068  1.500000
5    video         NA  1.000000
6      web  0.7564398  1.534884
+1

Source: https://habr.com/ru/post/1793442/


All Articles