Mutation of multiple columns in a data frame

Question

Mutation of multiple columns in a data frame

I have a dataset that looks like this.

bankname bankid year totass cash bond loans Bank A 1 1881 244789 7250 20218 29513 Bank B 2 1881 195755 10243 185151 2800 Bank C 3 1881 107736 13357 177612 NA Bank D 4 1881 170600 35000 20000 5000 Bank E 5 1881 3200000 351266 314012 NA

and I want to calculate some ratios based on bank balances. and I want the data set to look like this

 bankname bankid year totass cash bond loans CashtoAsset BondtoAsset LoanstoAsset Bank A 1 1881 2447890 7250 202100 951300 0.002 0.082 0.388 Bank B 2 1881 195755 10243 185151 2800 0.052 0.945 0.014 Bank C 3 1881 107736 13357 177612 NA 0.123 1.648585431 NA Bank D 4 1881 170600 35000 20000 5000 0.205 0.117 0.029 Bank E 5 1881 32000000 351266 314012 NA 0.0109 0.009 NA

Here is the code for data replication

 bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E") bankid <- c( 1, 2, 3, 4, 5) year<- c( 1881, 1881, 1881, 1881, 1881) totass <- c(244789, 195755, 107736, 170600, 32000000) cash<-c(7250,10243,13357,35000,351266) bond<-c(20218,185151,177612,20000,314012) loans<-c(29513,2800,NA,5000,NA) bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)

Firstly, I got rid of the NS in the balance sheets.

 cols <- c("totass", "cash", "bond", "loans") bankdata[cols][is.na(bankdata[cols])] <- 0

Then i calculate the odds

 library(dplyr) bankdata<-mutate(bankdata,CashtoAsset = cash/totass) bankdata<-mutate(bankdata,BondtoAsset = bond/totass) bankdata<-mutate(bankdata,loanstoAsset =loans/totass)

But instead of calculating all these relationships line by line, I want to create a view to do it all at once. In Stata, I would do

 foreach x of varlist cash bond loans { by bankid: gen `x'toAsset = `x'/ totass }

How can I do it?

+6

r dplyr stata

H park Oct 6 '14 at 15:24

source share

6 answers

jazzurro · Answer 1 · 2014-10-06T15:50:32+0000

Update (as of December 2, 2017)

Since I answered this question, I realized that some SO users are checking this answer. Since then, the dplyr package has changed. So I leave the next update. Hope this helps some R users to learn how to use mutate_at() .

mutate_each() now deprecated. Instead, you want to use mutate_at() . You can specify in which columns you want to apply your function in .vars . One way is to use vars() . Another is to use a character vector containing the column names that you want to apply to the user-defined function in .fun . Another is to indicate columns with numbers (for example, 5: 7 in this case). Please note: if you use a column for group_by() , you need to change the number of column positions. Check out this question .

 bankdata %>% mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans)) bankdata %>% mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans")) bankdata %>% mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7) # bankname bankid year totass cash bond loans cash_toAsset bond_toAsset loans_toAsset #1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 #2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 #3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA #4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 #5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA

I specifically gave toAsset custom function in .fun , as this will help me sort the new column names. I used to use rename() . But I think in the present approach it is much easier to clear the column names with gsub() . If the above result is saved as out , you want to run the following code to remove _ in the column names.

 names(out) <- gsub(names(out), pattern = "_", replacement = "")

Original answer

I think you can save some typing with dplyr. The downside is that you rewrite money, bonds, and loans.

 bankdata %>% group_by(bankname) %>% mutate_each(funs(whatever = ./totass), cash:loans) # bankname bankid year totass cash bond loans #1 Bank A 1 1881 244789 0.02961734 0.082593581 0.12056506 #2 Bank B 2 1881 195755 0.05232561 0.945830247 0.01430359 #3 Bank C 3 1881 107736 0.12397899 1.648585431 NA #4 Bank D 4 1881 170600 0.20515826 0.117233294 0.02930832 #5 Bank E 5 1881 32000000 0.01097706 0.009812875 NA

If you prefer your expected result, I think you need to type. It seems that you are renaming the part you should do.

 bankdata %>% group_by(bankname) %>% summarise_each(funs(whatever = ./totass), cash:loans) %>% rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana; ana %>% merge(bankdata,., by = "bankname") # bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset #1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 #2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 #3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA #4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 #5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA

Kfb · Answer 2 · 2014-10-06T16:48:43+0000

Here is the data.table solution.

 library(data.table) setDT(bankdata) bankdata[, paste0(names(bankdata)[5:7], "toAsset") := lapply(.SD, function(x) x/totass), .SDcols=5:7] bankdata # bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset # 1: Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 # 2: Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 # 3: Bank C 3 1881 107736 13357 177612 0 0.12397899 1.648585431 0.00000000 # 4: Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 # 5: Bank E 5 1881 32000000 351266 314012 0 0.01097706 0.009812875 0.00000000

hvollmeier · Answer 3 · 2014-10-06T17:33:25+0000

Apply and cbind

 cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass)) names(bankdata)[8:10] <- paste0(names(bankdata)[5:7], 'toAssest') > bankdata bankname bankid year totass cash bond loans cashtoAssest bondtoAssest loanstoAssest 1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA 4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA

Matt jolly · Answer 4 · 2014-10-06T15:46:50+0000

You can make it a little harder than necessary. Just try it and see if it gives what you need.

 bankdata$CashtoAsset <- bankdata$cash / bankdata$totass bankdata$BondtoAsset <- bankdata$bond / bankdata$totass bankdata$loantoAsset <- bankdata$loans / bankdata$totass bankdata

Yields:

 bankname bankid year totass cash bond loans CashtoAsset BondtoAsset loantoAsset 1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 3 Bank C 3 1881 107736 13357 177612 0 0.12397899 1.648585431 0.00000 4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 5 Bank E 5 1881 32000000 351266 314012 0 0.01097706 0.009812875 0.00000000

This should make you start in the right direction.

shadowtalker · Answer 5 · 2014-10-06T15:50:08+0000

This is one of dplyr ’s biggest flaws: as far as I know, there is no easy way to use it programmatically, and not interactively, without any “hacking” like the regrettable eval(parse(text=foo)) idiom.

The simplest approach is the same as in the Stata method, but string manipulation is a bit more detail in R than in Stata (or any other scripting language, for that matter).

 for (x in c("cash", "bond", "loans")) { bankdata[sprintf("%stoAsset", x)] <- bankdata[x] / bankdata$totass # or, equivalently, bankdata["totass"] for a consistent "look" ## can also replace `sprintf("%stoAsset", x)` with `paste0(c(x, "toAsset"))` or even `paste(x, "toAsset", collapse="") depending on what makes more sense to you. }

To make everything look more like Stata, you can wrap it all within like this:

 bankdata <- within(bankdata, for (x in c("cash", "bond", "loans")) { assign(x, get(x) / totass) })

but this entails hacking with the get and assign functions, which are not so safe to use as a whole, although in your case this probably doesn't matter much. For example, I would not recommend using similar tricks with dplyr , because dplyr abuses R with non-standard rating functions, and this is probably more of a problem than it's worth. For a quicker and probably excellent solution, check out the data.table package, which (I think) will allow you to use loop syntax similar to Stata, but with dplyr speed. Check package vignette on CRAN.

Also, do you really want to reassign NA entries to 0?

rnso · Answer 6 · 2014-10-06T15:53:07+0000

Try:

 for(i in 5:7){ bankdata[,(i+3)] = bankdata[,i]/bankdata[,4] } names(bankdata)[(5:7)+3] = paste0(names(bankdata)[5:7], 'toAssest')

Output:

 bankdata bankname bankid year totass cash bond loans cashtoAssest bondtoAssest loanstoAssest 1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 3 Bank C 3 1881 107736 13357 177612 0 0.12397899 1.648585431 0.00000000 4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 5 Bank E 5 1881 32000000 351266 314012 0 0.01097706 0.009812875 0.00000000

Mutation of multiple columns in a data frame

Update (as of December 2, 2017)

Original answer

More articles: