Os.walk is very slow, any way to optimize?

I use os.walkto build a data warehouse map (this map is used later in the tool that I create)

This is the code I'm currently using:

def find_children(tickstore):
    children = []
    dir_list = os.walk(tickstore)
    for i in dir_list:
        children.append(i[0])
    return children

I did some analysis:

dir_list = os.walk(tickstore)runs instantly, if I do nothing with dir_list, then this function ends instantly.

Iterates over dir_list, which takes a lot of time, even if I am not doing appendsomething, just repeating it, this is what takes time.

Tickstore is a large data warehouse with ~ 10,000 directories.

It currently takes about 35 minutes to complete this function.

Is there any way to speed it up?

os.walk, , , .

+4
3

: Python 3.5 ( - RC, ). Python 3.5, os.walk , .

PEP 471.

PEP:

os.walk() Python , , - os.listdir() - stat() GetFileAttributes() , , .

- FindFirstFile/FindNextFile on Windows readdir POSIX - , . , Windows stat_result , .

, , os.walk() 2N N, N . ( , , .)

os.walk() 8-9 Windows, 2-3 POSIX. -. .

+9

os.walk , , stat , , .

PEP 471 Python 3.5. scandir, Python 2.7

+3

To optimize it in python2.7, use scandir.walk()instead os.walk(), the parameters will be exactly the same.

import scandir
directory = "/tmp"
res = scandir.walk(directory)
for item in res:
    print item

PS: Just like @recoup mentioned in the comment, scandirmust be installed before use in python2.7.

+1
source

Source: https://habr.com/ru/post/1606390/


All Articles