Skip value in RTF reports

I have a df data format for a report in RTF format like:

df <- ATRSLBL POPUL CENTRE BAGE BAGEC1 SEX Red PPS 37201 75 3 1 Red PPS 37201 71 2 2 Red PPS 37201 73 2 1 Red PPS 38201 66 2 2 Blue PPS 37201 78 3 2 Blue PPS 38201 71 2 2 Blue PPS 38201 71 2 1 Blue PPS 38201 64 1 2 

I want to print it as:

 ATRSLBL POPUL CENTRE BAGE BAGEC1 SEX Red PPS 37201 75 3 1 PPS 71 2 2 PPS 73 2 1 PPS 38201 66 2 2 Blue PPS 37201 78 3 2 PPS 38201 71 2 2 PPS 71 2 1 PPS 64 1 2 

Can anybody help me.

+5
source share
2 answers

Here is one way with dplyr . I'm not sure if ATRSLBL is a symbol or factor. My guess is that this is a factor. First, I converted the ATRSLBL character to a character. Then I replaced the duplicated Red and Blue with "" . I also created a group variable using cumsum() in the first part of mutate() . Using a group variable, I grouped the data and applied replace() to CENTRE . Here I say R, if the line number of each group is not 1, replace any character with "" . Therefore, you save the information in the first line of each group. Then you ungroup the data and split the group variable with select() . Hope this helps you.

 library(dplyr) mutate(mydf, ATRSLBL = replace(as.character(ATRSLBL), which(duplicated(ATRSLBL) == TRUE), ""), group = cumsum(c(T, abs(diff(CENTRE)) > 1))) %>% group_by(group) %>% mutate(CENTRE = replace(CENTRE, which(row_number(CENTRE) != 1), "")) %>% ungroup %>% select(-group) # ATRSLBL POPUL CENTRE BAGE BAGEC1 SEX # (chr) (fctr) (chr) (int) (int) (int) #1 Red PPS 37201 75 3 1 #2 PPS 71 2 2 #3 PPS 73 2 1 #4 PPS 38201 66 2 2 #5 Blue PPS 37201 78 3 2 #6 PPS 38201 71 2 2 #7 PPS 71 2 1 #8 PPS 64 1 2 

DATA

 mydf <- structure(list(ATRSLBL = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Blue", "Red"), class = "factor"), POPUL = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "PPS", class = "factor"), CENTRE = c(37201L, 37201L, 37201L, 38201L, 37201L, 38201L, 38201L, 38201L), BAGE = c(75L, 71L, 73L, 66L, 78L, 71L, 71L, 64L), BAGEC1 = c(3L, 2L, 2L, 2L, 3L, 2L, 2L, 1L), SEX = c(1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L)), .Names = c("ATRSLBL", "POPUL", "CENTRE", "BAGE", "BAGEC1", "SEX"), class = "data.frame", row.names = c(NA, -8L)) 
+3
source

We can do this with data.table . We will convert 'data.frame' to 'data.table' ( setDT(df) ). We get the duplicated 'ATRSLBL' logical index and assign ( := ) it. '' We create a grouping variable ( cumsum(ATRSLBL !='') ) And get the duplicated row index 'CENTER', use this index to assign 'CENTER' '' after converting the column 'CENTER' to 'character'

 library(data.table) setDT(df)[duplicated(ATRSLBL), ATRSLBL := ''] i1 <- df[, .I[duplicated(CENTRE)] , cumsum(ATRSLBL!='')]$V1 df[, CENTRE:= as.character(CENTRE)][i1, CENTRE:= ''] df # ATRSLBL POPUL CENTRE BAGE BAGEC1 SEX #1: Red PPS 37201 75 3 1 #2: PPS 71 2 2 #3: PPS 73 2 1 #4: PPS 38201 66 2 2 #5: Blue PPS 37201 78 3 2 #6: PPS 38201 71 2 2 #7: PPS 71 2 1 #8: PPS 64 1 2 

NOTE. Here I accept the column "ATRSLBL" as character .

+3
source

Source: https://habr.com/ru/post/1233689/


All Articles