Take every nth line from the file with groups, and n - in the column

I have seen here and here about how to return every nth line; but my problem is different. A separate column in the file contains information about which nth item should be returned; which vary by group. Here is an example dataset where a column Nthprovides returned rows. That is, for Idgroup afor every third row and for Idgroup bfor every fourth row. The data is pretty significant with a few groups Id.

Id  TagNo   Nth
a   A-A-3   3
a   A-A-1   3
a   A-A-5   3
a   A-A-2   3
a   AX-45   3
a   AX-33   3
b   B-B-5   4
b   B-B-4   4
b   B-B-3   4
b   BX-B2   4 

Required Conclusion:

Id  TagNo   Nth
 a  A-A-3   3
 a  A-A-2   3
 b  B-B-5   4

Thank you for your help.

: , , first n- ; a 4- b. a 1st,4th, 7th... b 1st,5th, 9th . . .

+4
8

R:

do.call(rbind, lapply(split(df, df$Id), function(x) x[seq(from = 1, to = nrow(x), by = unique(x$Nth)), ]))

    Id TagNo Nth
a.1  a A-A-3   3
a.4  a A-A-2   3
b    b B-B-5   4
+2

awk :

awk '!a[$1]++{print; if(NR>1) n=NR+$3} NR==n{print; n=NR+$3}' file

Id  TagNo   Nth
a   A-A-3   3
a   A-A-2   3
b   B-B-5   4
+6

awk,

$ cat awk-sc
{
  if(id==$1){
    nth--;
    if(nth==0){print; nth=$3}
  } else {
    id=$1;nth=$3;print
  }
}

$ awk -f awk-sc file
Id  TagNo   Nth
a   A-A-3   3
a   A-A-2   3
b   B-B-5   4
+2

data.table

df <- data.table(read.table(text = "Id  TagNo   Nth
a   A-A-3   3
a   A-A-1   3
a   A-A-5   3
a   A-A-2   3
a   AX-45   3
a   AX-33   3
b   B-B-5   4
b   B-B-4   4
b   B-B-3   4
b   BX-B2   4", header = T))

df <- df[, id := seq_len(.N), by = Id]
df[id %% Nth == 1 , 1:3, by = Id]

  Id TagNo Nth
1:  a A-A-3   3
2:  a A-A-2   3
3:  b B-B-5   4
+2

Python.

from __future__ import print_function

with open('file.csv') as f:
    print(*next(f).split())    # header

    lastid = None
    lineno = 0
    for line in f:
        id_, tagno, nth = line.split()

        if lastid != id_:
            lineno = 0

        if lineno % int(nth) == 0:
            print(id_, tagno, nth)

        lastid = id_
        lineno += 1
+1

base R.
-, . , dat <- read.csv("file.csv").

dat <-
structure(list(Id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), .Label = c("a", "b"), class = "factor"), TagNo = structure(c(3L, 
1L, 4L, 2L, 6L, 5L, 9L, 8L, 7L, 10L), .Label = c("A-A-1", "A-A-2", 
"A-A-3", "A-A-5", "AX-33", "AX-45", "B-B-3", "B-B-4", "B-B-5", 
"BX-B2"), class = "factor"), Nth = c(3L, 3L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L)), .Names = c("Id", "TagNo", "Nth"), class = "data.frame", row.names = c(NA, 
-10L))

R.

dat2 <- do.call(rbind, lapply(split(dat, dat$Nth), function(x)
            x[c(1 + (1:(nrow(x) %/% x[1, "Nth"]) - 1)*x[1, "Nth"]), ]))
row.names(dat2) <- NULL
dat2
#  Id TagNo Nth
#1  a A-A-3   3
#2  a A-A-2   3
#3  b B-B-5   4
+1

awk

$ awk 'a!=$1{a=$1; n=$3; k=-1} FNR>1 && ++k%n!=0{next} 1' f1
Id  TagNo   Nth
a   A-A-3   3
a   A-A-2   3
b   B-B-5   4

a!=$1{a=$1; n=$3; k=-1}: a - , /. a , , a, n k=-1.

FNR>1 && ++k%n!=0{next}: increment k first/header n , , n- . Else It nth .

, :

$ awk 'FNR==1{print; next;}  a!=$1{a=$1; n=$3; k=0; print; next} ++k%n==0{print}' f1
Id  TagNo   Nth
a   A-A-3   3
a   A-A-2   3
b   B-B-5   4

FNR==1{print; next;}:

a!=$1{a=$1; n=$3; k=0; print; next}: ais a variable that tracks the first field / column. If it is anot initialized or the first column is different from the previous one, set a, nand k=0.

++k%n==0{print}: continue to increase kwith each new record, and if the remainder c ngives zero, this means that this is the nth record.

+1
source

Python solution:

with open('YOURFILENAME', 'r') as f:
    i = 1
    print('Id  TagNo   Nth')
    for line in f.readlines():
        if not i:
            print(line, end='')
            i = int(line.split()[-1])
        i -= 1

You can change the print () function to write () or any other functions. Since the title is fixed, I did not include it in my code.

Update. Print the title separately.

0
source

Source: https://habr.com/ru/post/1687610/


All Articles