Cartesian list of files from individual folders

I have a directory with numerous folders in it, and I want to create a Cartesian list of all the files in each folder separately. Thus, each folder will receive its own Cartesian list.

I can do this for one folder as follows:

import pandas as pd
import os, glob, itertools

path =(r'C:\pathway')
allfiles = glob.glob(path + "/*.csv")
result = list(itertools.product(allfiles,allfiles))

I can view all files in all folders, for example:

path =(r'C:\pathway') 

for subdir, dirs, files in os.walk(path):
    for file in files: 
        df=pd.read_csv(os.path.join(subdir,file))

but I'm not sure how to make separate Cartesian lists for files in each separate folder.

+4
source share
3 answers

If you want to apply your method to all subfolders in your directory, you can use the following code:

os.walk(<directory>)
y = next(os.walk('.'))[1]


directory = "/Users/bla/asd"
folders = os.walk(directory)
folders_arr = folders.next()[1]

results=[]
for folder_name in folders_arr: 
     path = directory + "/" + folder_name
     allfiles = glob.glob(path)
     results.append(list(itertools.product(allfiles,allfiles)))
+1
source

glob , , , , :

from glob import glob
from os.path import join
from itertools import product

BASE_PATH = 'C:\pathway'
all_files = glob(join(BASE_PATH, '*', '*.csv')) # C:\pathway\*\*.csv
result = list(product(all_files, all_files))

docs ( ):

pathname (, /usr/src/Python-1.5/Makefile), (, ../../Tools/*/*.gif)

+1

Breakdown it into modular components will help in the future to comprehend a similar code:

def flatFolders(rootPath):
    '''
        Given a root path (folder) containing deeply nested folders,
        returns a dictionary {folder1:[files of folder1], 
                              folder2:[files of folder2], ...}
    '''
    foldersToFiles = {}
    ...:  # you can use recursions here, or os.walk(rootPath)
        path = ...
        files = ...
        folders[path] = files
    return foldersToFiles

def cartesianSelfProduct(lst):
    '''
        cartesianSelfProduct([1,2,3]) ->
          [[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3]]
    '''
    return [(x,y) for x in lst for y in lst]

def flatFolderPairs(rootPath):
    foldersToFiles = flatFolders(rootPath)
    return {folder:cartesianSelfProduct(files) for folder,files in foldersToFiles}
+1
source

Source: https://habr.com/ru/post/1623885/


All Articles