Os.walk () does not match file names

I am trying to use a python script to edit a large directory of .html files in a loop. I'm having problems flashing files with os.walk (). This piece of code simply turns the html files into strings that I can work with, but the script does not even go into the loop, as if the files do not exist. Mostly it prints point1, but never reaches point2. The script ends without an error message. The directory is configured inside a folder called "amazon", and inside it is one of 20 subfolders with 20 html files in each of them.

Oddly enough, the code works fine in the neighboring directory, which contains only .txt files, but for some reason it does not capture my .html files. Is there something I don’t understand about the structure of the loop for root, dirs, filenames in os.walk()? This is my first time I used os.walk, and I looked at a number of other pages on this site to try and get it working.

import os

rootdir = 'C:\filepath\amazon'
print "point1"
for root, dirs, filenames in os.walk(rootdir):
    print "point2"
    for file in filenames:
        with open (os.path.join(root, file), 'r') as myfile:
             g = myfile.read()
        print g

Any help is greatly appreciated.

+4
source share
4 answers

The backslash is used as an escape. Either double them, or use "raw strings" by prefixing it with "r".

Example:

>>> 'C:\filepath\amazon'
'C:\x0cilepath\x07mazon'
>>> r'\x'
'\\x'
>>> '\x'
ValueError: invalid \x escape

Explanation: In Python, what precedes a string literal with "r" means?

+6
source

, :

>>> rootdir = 'C:\filepath\amazon'
>>> rootdir
'C:\x0cilepath\x07mazon'
>>> print(rootdir)
C:
  ilepathmazon

Python , rootdir \f ASCII Feed Feed \a ASCII Bell.

( r ), :

>>> rootdir = r'C:\filepath\amazon'
>>> rootdir
'C:\\filepath\\amazon'
>>> print(rootdir)
C:\filepath\amazon

... , Windows://

>>> rootdir = 'C:/filepath/amazon'
>>> rootdir
'C:/filepath/amazon'
>>> print(rootdir)
C:/filepath/amazon

, , os.path.join(), ... :

>>> rootdir = os.path.join('C:', 'filepath', 'amazon')
>>> rootdir
'C:\\filepath\\amazon'  # presumably ... I don't use Windows.
>>> print(rootdir)
C:\filepath\amazon
+2

os.path.join:

rootdir = os.path.join('C:', 'filepath', 'amazon')
+2

, os.walk. (\), Mac - , .

, :

/Volumes/MacHD/My Folder/MyFiles/...

when accessed through the terminal it is displayed as:

/Volumes/MacHD/My\ Folder/MyFiles/...

The solution was to read the path to the string, and then create a new string that removed the escape characters, for example:

# Ask user for directory tree to scan for master files
masterpathraw = raw_input("Specify directory of master files:")
# Clear escape characters from the path
masterpath = masterpathraw.replace('\\', '')
# Provide this path to os.walk
for fullpath, _, filenames in os.walk(masterpath):
    # Do stuff
0
source

Source: https://habr.com/ru/post/1542722/


All Articles