Sort data frame based on month format

I struggle with something very simple: sorting a data frame based on a time format (month-year or "% B-% y" in this case). My goal is to calculate various monthly statistics, starting with the amount.

Part of the relevant part of the data frame looks like this (it’s good and according to my purpose. I am including it here to show where the problem came from ):

> tmp09 Instrument AccountValue monthYear ExitTime 1 JPM 6997 april-07 2007-04-10 2 JPM 7261 mei-07 2007-05-29 3 JPM 7545 juli-07 2007-07-18 4 JPM 7614 juli-07 2007-07-19 5 JPM 7897 augustus-07 2007-08-22 10 JPM 7423 november-07 2007-11-02 11 KFT 6992 mei-07 2007-05-14 12 KFT 6944 mei-07 2007-05-21 13 KFT 7069 juli-07 2007-07-09 14 KFT 6919 juli-07 2007-07-16 # Order on the exit time, which corresponds with 'monthYear' > tmp09.sorted <- tmp09[order(tmp09$ExitTime),] > tmp09.sorted Instrument AccountValue monthYear ExitTime 1 JPM 6997 april-07 2007-04-10 11 KFT 6992 mei-07 2007-05-14 12 KFT 6944 mei-07 2007-05-21 2 JPM 7261 mei-07 2007-05-29 13 KFT 7069 juli-07 2007-07-09 14 KFT 6919 juli-07 2007-07-16 3 JPM 7545 juli-07 2007-07-18 4 JPM 7614 juli-07 2007-07-19 5 JPM 7897 augustus-07 2007-08-22 10 JPM 7423 november-07 2007-11-02 

So far, so good, and ExitTime-based sorting works. The problem starts when I try to calculate the totals per month, and then try to sort this result:

 # Calculate the total results per month > Tmp09Totals <- tapply(tmp09.sorted$AccountValue, tmp09.sorted$monthYear, sum) > Tmp09Totals <- data.frame(Tmp09Totals) > Tmp09Totals Tmp09Totals april-07 6997 augustus-07 7897 juli-07 29147 mei-07 21197 november-07 7423 

How can I sort this conclusion chronologically?

I already tried (besides various attempts to convert monthYear to a different date format): order, sort, sort.list, sort_df, reshape and calculating the amount based on tapply, lapply, sapply, aggregate. And even rewriting the names of the rosers (giving them a number from 1 to the length (tmp09.sorted2 $ AccountValue)) did not work. I also tried to give each month of the year a different identifier based on what I learned on a different issue, but R also had difficulty distinguishing between different monthly values.

The correct order of this release will be April-07, May-07, July-07, August07, November-07:

 apr-07 6997 mei-07 21197 jul-07 29147 aug-07 7897 nov-07 7423 

Of the ideas, do you have one?

+4
source share
6 answers

It would be easier to have the separate Month and Year factors in the correct order and use tapply to combine both variables, for example:

 ## The Month factor tmp09 <- within(tmp09, Month <- droplevels(factor(strftime(ExitTime, format = "%B"), levels = month.name))) ## for @Jura25 locale, we can't use the in built English constant ## instead, we can use this solution, from ?month.name: ## format(ISOdate(2000, 1:12, 1), "%B")) tmp09 <- within(tmp09, Month <- droplevels(factor(strftime(ExitTime, format = "%B"), levels = format(ISOdate(2000, 1:12, 1), "%B")))) ## ## And the Year factor tmp09 <- within(tmp09, Year <- factor(strftime(ExitTime, format = "%Y"))) 

What gives us (in my locale):

 > head(tmp09) Instrument AccountValue monthYear ExitTime Month Year 1 JPM 6997 april-07 2007-04-10 April 2007 2 JPM 7261 mei-07 2007-05-29 May 2007 3 JPM 7545 juli-07 2007-07-18 July 2007 4 JPM 7614 juli-07 2007-07-19 July 2007 5 JPM 7897 augustus-07 2007-08-22 August 2007 10 JPM 7423 november-07 2007-11-02 November 2007 

Then use tapply with both factors:

 > with(tmp09, tapply(AccountValue, list(Month, Year), sum)) 2007 April 6997 May 21197 July 29147 August 7897 November 7423 

or through aggregate :

 > with(tmp09, aggregate(AccountValue, list(Month = Month, Year = Year), sum)) Month Year x 1 April 2007 6997 2 May 2007 21197 3 July 2007 29147 4 August 2007 7897 5 November 2007 7423 
+9
source

Try using the "yearmon" class in the zoo as it sorts accordingly. Below we create a sample DF frame, and then add the YearMonth column of the "yearmon" class. Finally, we perform our aggregation. Actual processing is just the last two lines (the other part is just to create a sample data frame).

 Lines <- "Instrument AccountValue monthYear ExitTime JPM 6997 april-07 2007-04-10 JPM 7261 mei-07 2007-05-29 JPM 7545 juli-07 2007-07-18 JPM 7614 juli-07 2007-07-19 JPM 7897 augustus-07 2007-08-22 JPM 7423 november-07 2007-11-02 KFT 6992 mei-07 2007-05-14 KFT 6944 mei-07 2007-05-21 KFT 7069 juli-07 2007-07-09 KFT 6919 juli-07 2007-07-16" library(zoo) DF <- read.table(textConnection(Lines), header = TRUE) DF$YearMonth <- as.yearmon(DF$ExitTime) aggregate(AccountValue ~ YearMonth + Instrument, DF, sum) 

This gives the following:

 > aggregate(AccountValue ~ YearMonth + Instrument, DF, sum) YearMonth Instrument AccountValue 1 Apr 2007 JPM 6997 2 May 2007 JPM 7261 3 Jul 2007 JPM 15159 4 Aug 2007 JPM 7897 5 Nov 2007 JPM 7423 6 May 2007 KFT 13936 7 Jul 2007 KFT 13988 

A slightly different approach and output directly use read.zoo . It produces one column per instrument and one row per year / month. We read in the columns, assigning them the appropriate classes, using "NULL" for the monthYear column, since we will not use it. We also indicate that the time index is the third column of the remaining columns and that we want the input to be split into columns by the 1st column. FUN=as.yearmon indicates that we want the time index to be converted from the "Date" class to the "yearmon" class, and we summarize everything using sum .

 z <- read.zoo(textConnection(Lines), header = TRUE, index = 3, split = 1, colClasses = c("character", "numeric", "NULL", "Date"), FUN = as.yearmon, aggregate = sum) 

The resulting zoo object is as follows:

 > z JPM KFT Apr 2007 6997 NA May 2007 7261 13936 Jul 2007 15159 13988 Aug 2007 7897 NA Nov 2007 7423 NA 

We may prefer to save it as an object of the zoo, to use other functions in the zoo, or we can convert it to a data frame as follows: data.frame(Time = time(z), coredata(z)) , which makes time a separate column or as.data.frame(z) , which uses string names for time. fortify.zoo()z) also works.

+4
source

You can reorder factor levels using the reorder function.

 tmp09$monthYear <- reorder(tmp09$monthYear, as.numeric(as.Date(tmp09$ExitTime))) 

Trick - use the numeric representation of the date as the number of days from 1970-01-01 (see ?Date ) and use the average value as a reference.

+3
source

Edit: I first misunderstood the question. First copy the data specified in the question, then

 > tmp09 <- read.table(file="clipboard", header=TRUE) > Sys.setlocale(category="LC_TIME", locale="Dutch_Belgium.1252") [1] "Dutch_Belgium.1252" # create POSIXlt variable from monthYear > tmp09$d <- strptime(paste("2007", tmp09$monthYear, sep="-"), "%Y-%B-%d") # create ordered factor > tmp09$dFac <- droplevels(cut(tmp09$d, breaks="month", ordered=TRUE)) > tmp09[order(tmp09$d), ] Instrument AccountValue monthYear ExitTime d dFac 1 JPM 6997 april-07 2007-04-10 2007-04-07 2007-04-01 2 JPM 7261 mei-07 2007-05-29 2007-05-07 2007-05-01 11 KFT 6992 mei-07 2007-05-14 2007-05-07 2007-05-01 12 KFT 6944 mei-07 2007-05-21 2007-05-07 2007-05-01 3 JPM 7545 juli-07 2007-07-18 2007-07-07 2007-07-01 4 JPM 7614 juli-07 2007-07-19 2007-07-07 2007-07-01 13 KFT 7069 juli-07 2007-07-09 2007-07-07 2007-07-01 14 KFT 6919 juli-07 2007-07-16 2007-07-07 2007-07-01 5 JPM 7897 augustus-07 2007-08-22 2007-08-07 2007-08-01 10 JPM 7423 november-07 2007-11-02 2007-11-07 2007-11-01 > Tmp09Totals <- tapply(tmp09$AccountValue, tmp09$dFac, sum) > Tmp09Totals 2007-04-01 2007-05-01 2007-07-01 2007-08-01 2007-11-01 6997 21197 29147 7897 7423 
+1
source

It seems like the main problem is how to chronologically sort the row sequence in a month-year. The easiest way is to pre-postpone "01" at the beginning of each month-month line and sort them like regular dates. So, take the final Tmp09Totals data frame and do the following:

 monYear <- rownames(Tmp09Totals) sortedMonYear <- format(sort( as.Date( paste('01-', monYear, sep = ''), '%d-%B-%y')), '%B-%y') Tmp09Totals[ sortedMonYear, , drop = FALSE] 
+1
source

Old post, but worthy of the data.table approach:

Read the data and set local values ​​as described in @caracal

 > Sys.setlocale(category="LC_TIME", locale="Dutch_Belgium.1252") [1] "Dutch_Belgium.1252" > tmp09 <- read.table(file="clipboard", header=TRUE) > tmp09$ExitTime <- as.Date(tmp09$ExitTime) 

Submit data on request

 require(data.table) > data.table(tmp09)[, + .(Tmp09Total = sum(AccountValue)), + by = .(Date = format(ExitTime, "%B-%y"))] Date Tmp09Total 1: april-07 6997 2: mei-07 21197 3: juli-07 29147 4: augustus-07 7897 5: november-07 7423 
0
source

Source: https://habr.com/ru/post/1334186/


All Articles