Automate tasks for multiple files using R to modify the source data files in the R working directory

I am new to R and currently learning how to automate working with files in a directory, I am currently trying to use 5 samples of CSV files with hourly sample data in each column for 24 hours in a directory. I am trying to set some codes for organizing files in a suitable format for the future so that I can easily read in R later. My files are in a weird format with 6 top lines with unnecessary data. I am trying to perform several tasks as follows:

An example of my data file:

"w", "Fri 1 Jan", "123", "42", "12", "21" "w", "Sat 2 Jan", "23", "54", "62", "31" "w", "Sun 3 Jan", "13", "32", "22", "32" "w", "Mon 4 Jan", "153", "42", "52", "31" "w", "Tue 5 Jan", "13", "14", "67", "35" 
  • Task 1: ignore the first 6 lines and start reading from the 7th line

  • Task 2: set the columns, heading: "type", "date", "1", "2", "3", "sample"

  • Task 3: Each of my files has a file name similar to this: "605_E875071_N713451.csv" - I am trying to create 3 new separate columns with names: ID = "605", Easting = "875071", and Northing = "713451"

  • Task 4: create some kind of loop to perform all these steps and save directly in the source file

I tried to parse each step individually, and so far I have been able to find the codes on the website to complete the tasks, as shown below:

Task 1:

 data = read.csv(file.choose (), skip = 6 )` 

Task 2:

 colnames(data) = c("type", "date", "1", "2", "3", "Total") 

Task 3:

I am not sure how to split into 3 different columns; So far, what I have is, you can create an additional column and type the full name "605_E875071_N713451":

 read_csv_filename <- function(filename){ ret <- read.csv(filename) ret$Source <- filename ret } import.list <- ldply(filenames, read_csv_filename) ldply(filenames, read_csv_filename) 

What I'm finally trying to achieve is as follows:

 "type", "date", "ID", "Easting", "Northing", "1", "2", "3", "Total" "w", "Fri 1 Jan","605" ,"875071", "713451","123", "42", "12", "21" "w", "Sat 2 Jan","605" ,"875071", "713451","23", "54", "62", "31" "w", "Sun 3 Jan","605" ,"875071", "713451","13", "32", "22", "32" "w", "Mon 4 Jan","605" ,"875071", "713451","153", "42", "52", "31" "w", "Tue 5 Jan", "605" ,"875071", "713451","13", "14", "67", "35" 

and finally, I wonder if there is a way to automate these steps to automatically complete tasks and complete the steps for all 5 files in the directory and save them back to the source files?

I would be very grateful for any of your advice and advice, thanks

+4
source share
1 answer

I would say that you are on the right track for steps 1 and 2. However, to automate the process, you will want to use list.files() , not file.choose() .

Also, I would suggest avoiding column names that start with or are just a number. call them 'one' 'two' 'three' or 'V1', etc. so you can use $ to explore them later.

For task 3, review strsplit :

 out <- strsplit(filename,'_') 

then you can grab the pieces and do what you will with them:

 gsub('N', '', lapply(out, '[', 2)) # should get your Easting column 

As far as your last question, the simple answer is no. the more complicated answer is that it is complex! If the files are not very large (1e7 lines or more), or there is very little bar on your computer, you should be fine reading each file in R (and therefore memory) and writing them back.

Some note: when you are working on this, do not hesitate to ask specific questions (ideally, with some examples of your data so that we can reproduce what you are working on), and you will get more and more accurate answers.

+3
source

Source: https://habr.com/ru/post/1403541/


All Articles