Cannot delete french letters in string returned by python ball

I would like to rename files with french letters. I use glob to browse files and a function that I found on the Internet to remove French letters. supprime_accentseems to be working fine. However, it does not rename files returned by the glob function.

Does anyone know what the reason is? Is it connected with the globe?

def supprime_accent(ligne):
    """ supprime les accents du texte source """
    accents = { 'a': ['à', 'ã', 'á', 'â'],
                'e': ['é', 'è', 'ê', 'ë'],
                'i': ['î', 'ï'],
                'u': ['ù', 'ü', 'û'],
                'o': ['ô', 'ö'] }
    for (char, accented_chars) in accents.iteritems():
        for accented_char in accented_chars:
            ligne = ligne.replace(accented_char, char)
    return ligne

for file_name in glob.glob("attachments/*.jpg"):
    print supprime_accent(file_name)
+3
source share
3 answers

Here I see two potential problems.

First, you need to use Unicode strings in the source code, and you need to tell Python what the source code encoding is in . Unfortunately, this doubles the number of vowels in the table correctly ...: - \

# -*- coding: UTF-8 -*-
...
accents = { u'a': [u'à', u'ã', u'á', u'â'],
            u'e': [u'é', u'è', u'ê', u'ë'],
            u'i': [u'î', u'ï'],
            u'u': [u'ù', u'ü', u'û'],
            u'o': [u'ô', u'ö'] }

-, , , glob, .

import sys
file_name = file_name.decode(sys.getfilesystemencoding())

Python 3.0 : , Unicode u.

+2

, , -1 ascii

unicode glob, Unicode, .

for file_name in glob.glob(u"attachments/*.jpg"):
    print file_name.encode('ascii', 'latin2ascii')
+1

I will be able to fix the problem by converting the file_name to unicode with enncoding cp1252.

for file_name in glob.glob("attachments/*.jpg"):
    file_name = file_name.decode(sys.getfilesystemencoding())
    print unicodedata.normalize('NFKD', file_name).encode('ascii','ignore')

Edit: Jason gave a better solution, replacing unicode (file_name, 'cp1252') with file_name.decode (sys.getfilesystemencoding ())

+1
source

Source: https://habr.com/ru/post/1727787/


All Articles