How to extract data from a text file using R or PowerShell?

I have a text file containing such data:

This is just text ------------------------------- Username: SOMETHI C: [Text] Account: DFAG Finish time: 1-JAN-2011 00:31:58.91 Process ID: 2028aaB Start time: 31-DEC-2010 20:27:15.30 This is just text ------------------------------- Username: SOMEGG C: [Text] Account: DFAG Finish time: 1-JAN-2011 00:31:58.91 Process ID: 20dd33DB Start time: 12-DEC-2010 20:27:15.30 This is just text ------------------------------- Username: SOMEYY C: [Text] Account: DFAG Finish time: 1-JAN-2011 00:31:58.91 Process ID: 202223DB Start time: 15-DEC-2010 20:27:15.30 

Is there a way to extract Username, Finish time, Start time from this data type? I am looking for some starting point using R or Powershell.

+6
source share
4 answers

R may not be the best tool for processing text files, but you can do the following: identify two columns by reading the file as a fixed-width file, separate the fields from their values, dividing the lines by colons, add the id column and return everything in order .

 # Read the file d <- read.fwf("A.txt", c(37,100), stringsAsFactors=FALSE) # Separate fields and values d <- d[grep(":", d$V1),] d <- cbind( do.call( rbind, strsplit(d$V1, ":\\s+") ), do.call( rbind, strsplit(d$V2, ":\\s+") ) ) # Add an id column d <- cbind( d, cumsum( d[,1] == "Username" ) ) # Stack the left and right parts d <- rbind( d[,c(5,1,2)], d[,c(5,3,4)] ) colnames(d) <- c("id", "field", "value") d <- as.data.frame(d) d$value <- gsub("\\s+$", "", d$value) # Convert to a wide data.frame library(reshape2) d <- dcast( d, id ~ field ) 
+8
source

These are just recommendations on how I approach the problem. I am sure there is a more bizarre way to do this. Perhaps the inclusion of plyr. :)

 rara <- readLines("test.txt") # you could use readLines(textConnection = "text")) # find usernames usn <- rara[grepl("Username:", rara)] # you can find a fancy way to split or weed out spaces # I crudely do it like this: unlist(lapply(strsplit(usn, " "), "[", 2)) # 2 means "extract the second element" # and accounts acc <- rara[grepl("Account:", rara)] unlist(lapply(strsplit(acc, " "), "[", 2)) 

You can use str_trim() to remove spaces before / after a word. Hope you have enough pointers to get you going.

+2
source

Here is the Powershell solution:

 $result = @() get-content c:\somedir\somefile.txt | foreach { if ($_ -match '^Username:\s+(\S+)'){ $rec = ""|select UserName,FinishTime,StartTime $rec.UserName = $matches[1] } elseif ($_ -match '^Account.+Finish\stime:\s+(.+)'){ $rec.FinishTime = $matches[1] } elseif ($_ -match '^Process\sID:\s+\S+\s+Start\stime:\s+(.+)'){ $rec.StartTime = $matches[1] $result += $rec } } $result 
+2
source

Do you have a file in a data frame? Like column names, this is the username, process ID, start time ... If so, you can easily extract it using

 df$Username (where df is your data frame and if you want to see all your usernames) df$FinishTime 

If you want to know everything about a user with a specific name, use this

 df[df$username == "SOMETHI",] 

If you want to know the user with the end time.

Hope this could be the starting point. Let me know if not clear.

0
source

Source: https://habr.com/ru/post/906759/


All Articles