Removing all columns summing to zero with dplyr

I am currently working on a data framework that looks something like this:

Site Spp1 Spp2 Spp3 LOC TYPE S01 2 4 0 A FLOOD S02 4 0 0 A REG .... S10 0 1 0 B FLOOD S11 1 0 0 B REG 

What I'm trying to do is a subset of the DataFrame, so I can start analyzing some indicators in R.

The following code works in that I create two subsets of data, combining them into one frame, and then discard unused factor levels

 A.flood <- filter(data, TYPE == "FLOOD", LOC == "A") B.flood <- filter(data, TYPE == "FLOOD", LOC == "B") ABflood <- rbind(A.flood, B.flood) %>% droplevels.data.frame(ABflood, except = c("A", "B")) 

What I also hoped / should do is to remove all the Spp columns (there are ~ 60 in my real dataset) that add up to zero. Is there a way to achieve this with dplyr, and if so, is it possible to transfer this code to existing ABflood data code?

Thanks!

EDIT

I managed to remove all columns summed to zero by selecting only columns summed to> zero:

 ABflood.subset <- ABflood[, apply(ABflood[1:(ncol(ABflood))], 2, sum)!=0] 
+10
source share
4 answers

Without using any package, we can use rowSums "Spp" columns (a subset of columns with grep ) and double negation so that rows with sum>0 are TRUE and other FALSEs. Use this index for a subset of rows.

 data[!!rowSums(data[grep('Spp', names(data))]),] 

Or using dplyr/magrittr , we select "Spp" columns, get the sum each row with Reduce , deny it twice, and use the extract from magrittr to subset the original dataset using the index.

 library(dplyr) library(magrittr) data %>% select(matches('^Spp')) %>% Reduce(`+`, .) %>% `!` %>% `!` %>% extract(data,.,) 

data

 data <- structure(list(Site = c("S01", "S02", "S03", "S04"), Spp1 = c(2L, 4L, 0L, 4L), Spp2 = c(4L, 0L, 0L, 0L), Spp3 = c(0L, 0L, 0L, 0L ), LOC = c("A", "A", "A", "A"), TYPE = c("FLOOD", "REG", "FLOOD", "REG")), .Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC", "TYPE"), class = "data.frame", row.names = c(NA, -4L)) 
+6
source

I understand that this question is now quite old, but I came to meet and found another solution using dplyr "select" and "which", which may seem clearer to dplyr enthusiasts:

 ABflood.subset <- ABflood %>% select(which(!colSums(ABflood, na.rm=TRUE) %in% 0)) 
+5
source

You must convert to neat data using tidyr::gather() , and the data frame will be much easier to manipulate.

 library(tidyr) library(dplyr) ABFlood %>% gather(Species, Sp.Count, -Site, -LOC, -TYPE) %>% group_by(Species) %>% filter(Sp.Count > 0) 

Voila, your accurate data minus zero.

 # Site LOC TYPE Species Sp.Count # <fctr> <fctr> <fctr> <chr> <int> #1 S01 A FLOOD Spp1 2 #2 S02 A REG Spp1 4 #3 S11 B REG Spp1 1 #4 S01 A FLOOD Spp2 4 #5 S10 B FLOOD Spp2 1 

Personally, I would save it like that. If you want the original format to be returned with a zero number for undisclosed views, just add %>% spread(Species, Sp.Count, fill = 0) to the pipeline.

 # Site LOC TYPE Spp1 Spp2 #* <fctr> <fctr> <fctr> <dbl> <dbl> #1 S01 A FLOOD 2 4 #2 S02 A REG 4 0 #3 S10 B FLOOD 0 1 #4 S11 B REG 1 0 
+3
source

There is an even simpler and faster way to do this (as well as more relevant to your question: using dplyr).

 ABflood.subset <- ABflood[, colSums(ABflood != 0) > 0] 

or with MWE:

 df <- data.frame (x = rnorm(100), y = rnorm(100), z = rep(0, 100)) df[, colSums(df != 0) > 0] 
+1
source

Source: https://habr.com/ru/post/1237307/


All Articles