Renaming Cyrillic Names

What I mean is iterate through the folder to check if the file names contain any Cyrillic characters, if they do, rename these files to something else.

How can i do this?

+4
source share
3 answers

Python 3
This checks each character of the passed string, regardless of whether it is in a Cyrillic block and returns Trueif a Cyrillic character is present in the string. Strings in Python3 are unicode by default. The function encodes each character in utf-8 and checks to see if it gives two bytes corresponding to a block of the table that contains Cyrillic characters.

def isCyrillic(filename):
    for char in filename:            
        char_utf8 = char.encode('utf-8')      # encode to utf-8 

        if len(char_utf8) == 2 \              # check if we have 2 bytes and if the
            and 0xd0 <= char_utf8[0] <= 0xd3\ # first and second byte point to
            and 0x80 <= char_utf8[1] <= 0xbf: # Cyrillic block (unicode U+0400-U+04FF)
            return True

    return False

, ord(),

def isCyrillicOrd(filename):
    for char in filename:                  
        if 0x0400 <= ord(char) <= 0x04FF:    # directly checking unicode code point
            return True

    return False

cycont
   |---- asciifile.txt
   |---- .txt
   |---- ї́.txt
   |---- संस्कृत.txt

Test

import os
for (dirpath, dirnames, filenames) in os.walk('G:/cycont'):
    for filename in filenames:
        print(filename, isCyrillic(filename), isCyrillicOrd(filename))

asciifile.txt False False
.txt True True
ї́.txt True True
संस्कृत.txt False False
+3

Python 2:

# -*- coding: utf-8 -*-
def check_value(value):
    try:
        value.decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True

Python 3:

Python 3 'str' 'decode'. , .

# -*- coding: utf-8 -*-
def check_value(value):
    try:
        value.encode('ascii')
    except UnicodeEncodeError:
        return False
    else:
        return True

check_value.

+3

There is a library written for this: Python transliterate lib .

So, first you need to get the file names. To do this, use os.listdir ():

from os import listdir
from os.path import isfile, join
files = [ f for f in listdir(dir) if isfile(join(dir,f)) ]

Now you can look at each file in the files, substitute any characters as necessary:

import transliterate
newName = translit(filename, 'ru', reversed=True)

Then just rename the files with os.rename :

os.rename(filename, newName)
+1
source

Source: https://habr.com/ru/post/1569344/


All Articles