Reading subfolders and file names as a sublist

I try to read all the names of folders and files of a certain directory in a nested list, which will be as long as the number of folders is at the top level, then each list item has as many elements as there is a directory in the subelement (if it's a folder) and so on up to the level where there are only files and more folders.

My use case with my iTunes Music folder:

m <- "/Users/User/Music/iTunes/iTunes Media/Music" # set the path to the library folder x <- list.files(m, recursive = FALSE) # get all artists names (folder names on top level) # read all Albums and title of each song per album lst <- setNames(lapply(paste(m, x, sep = "/"), list.files, recursive = T), x) 

The structure of each element in lst now:

 #$`The Kooks` # artist name "The Kooks" # [1] "Inside In Inside Out/01 Seaside.mp3" # album name "Inside In Inside Out", title "01 Seaside.mp3" # [2] "Inside In Inside Out/02 See The World.mp3" #... #[16] "Konk/01 See The Sun.mp3" # second album of The Kooks #[17] "Konk/02 Always Where I Need To Be.mp3" 

What I'm trying to do is to record each nested artist list, so the example will have a $TheKooks list $TheKooks that has 2 (sub) lists (1 for each album): $Inside In Inside Out and $Konk , and in each of the album lists there is a vector of name names (without album names).

I could not find the correct answers (so far) on SO and tried (unsuccessfully), among other things:

 list.files(m, recursive = TRUE) 

and

 lapply(lst, function(l) { strsplit(l, "/") }) 

How to do it right?

PS :.

  • You can present the desired result as a list structure, where each file / folder name happens only as often as in the actual file / folders.
  • At best, I hope to find a solution that is flexible enough to allow different levels of folders, and won't require as many explicit lapply calls as the depth of folders
+6
source share
3 answers

The following function defines files and folders in a directory. Then it calls itself again for each identified folder, creating a list with the found files and subfolders.

 fileFun <- function(theDir) { ## Look for files (directories included for now) allFiles <- list.files(theDir, no.. = TRUE) ## Look for directory names allDirs <- list.dirs(theDir, full.names = FALSE, recursive = FALSE) ## If there are any directories, if(length(allDirs)) { ## then call this function again moreFiles <- lapply(file.path(theDir, allDirs), fileFun) ## Set names for the new list names(moreFiles) <- allDirs ## Determine files found, excluding directory names outFiles <- allFiles[!allFiles %in% allDirs] ## Combine appropriate results for current list if(length(outFiles)) { allFiles <- c(outFiles, moreFiles) } else { allFiles <- moreFiles } } return(allFiles) } ## Try with your directory? fileFun(m) 
+3
source

This solution should work, assuming your directory structure is always artist/album/songs . If some directories are deeper (or less deep), you will not get what you want.

First I get a list of directories (i.e. a list of executors):

 artists <- list.dirs(path=m,recursive=FALSE,full.names=FALSE) 

Then I create a nested list:

 lapply(artists,function(dir) { albums <- list.dirs(path=paste0(m,"/",dir),recursive=FALSE,full.names=FALSE) album.list <- lapply(albums,function(dir2) { list.files(path=paste0(m,"/",dir,"/",dir2)) }) names(album.list) <- albums album.list }) 

And finally, I call the top level of the list:

 names(music.list) <- artists 

The album level works identically to the artist level: I get the directories (corresponding to the albums), then I list the files inside (corresponding to the songs) and, finally, I call the list items album names.

EDIT: As the docendo distant points out, the above solution is not general. The following recursive solution should do this work in a more elegant way:

 rfl <- function(path) { folders <- list.dirs(path,recursive=FALSE,full.names=FALSE) if (length(folders)==0) list.files(path) else { sublist <- lapply(paste0(path,"/",folders),rfl) setNames(sublist,folders) } } rfl(m) 

This is still not completely general: as long as the folder contains subfolders, the algorithm descends into these folders without saving files that may also exist at the same depth in the list.

+3
source
 files = list.files(m ,recursive = T) music.df <- data.frame( artist = sapply(strsplit(files, '/'), '[[', 7), song = paste( sapply(strsplit(files, '/'), '[[', 8), sapply(strsplit(files, '/'), '[[', 9) , sep = '/' ) ) out <- split( music.df[,2] , f = music.df$artist ) 

I put the artist and album / title in the data frame, and then used split to split the data frame into lists by artist

or you can create a strsplit output data frame and then use split in the data frame. (ncol will vary depending on the depth of the folders)

 files = list.files(m ,recursive = T) music.df <- data.frame(matrix(unlist(strsplit(files, '/')), ncol = 9, byrow = T) ) out <- split( music.df[,9] , f = music.df[7:8]) 
0
source

Source: https://habr.com/ru/post/980531/


All Articles