Python, windows: parsing command lines with shlex

When you need to split the command line, for example, to invoke Popen, best practice looks like this:

subprocess.Popen(shlex.split(cmd),...

but RTFM

The class shlexmakes it easy to write lexical analyzers for simple syntaxes that resemble the syntax of a Unix shell ...

So what is the correct path on win32? What about parsing quotes and POSIX mode versus non POSIX?

+5
source share
1 answer

So far, Python stdlib for Windows / a multi-platform system does not have a valid command line split function. (March 2016)

subprocess

, , subprocess.Popen .call .. :

if sys.platform == 'win32':
    args = cmd
else:
    args = shlex.split(cmd)
subprocess.Popen(args, ...)

Windows shell, Popen subprocess.list2cmdline :-).

shell=True Unix shlex.split.

, Windows .bat .cmd ( .exe.com) - shell=True.

:

shlex.split(cmd, posix=0) Windows, . , posix = 0 shlex - 99% Windows/- ...

API Windows ctypes.windll.shell32.CommandLineToArgvW:

Unicode , , , C run-time argv argc.

def win_CommandLineToArgvW(cmd):
    import ctypes
    nargs = ctypes.c_int()
    ctypes.windll.shell32.CommandLineToArgvW.restype = ctypes.POINTER(ctypes.c_wchar_p)
    lpargs = ctypes.windll.shell32.CommandLineToArgvW(unicode(cmd), ctypes.byref(nargs))
    args = [lpargs[i] for i in range(nargs.value)]
    if ctypes.windll.kernel32.LocalFree(lpargs):
        raise AssertionError
    return args

CommandLineToArgvW C argv & argc:

>>> win_CommandLineToArgvW('aaa"bbb""" ccc')
[u'aaa"bbb"""', u'ccc']
>>> win_CommandLineToArgvW('""  aaa"bbb""" ccc')
[u'', u'aaabbb" ccc']
>>> 
C:\scratch>python -c "import sys;print(sys.argv)" aaa"bbb""" ccc
['-c', 'aaabbb"', 'ccc']

C:\scratch>python -c "import sys;print(sys.argv)" ""  aaa"bbb""" ccc
['-c', '', 'aaabbb"', 'ccc']

http://bugs.python.org/issue1724822 Python. ( "fisheye3" .)


Windows . . \ \\ \" \\"" \\\"aaa """"...

- - , Python lib. ; ~ 10 , shlex, char ; ( shlex). Windows Linux bash, posix test_shlex. .

def cmdline_split(s, platform='this'):
    """Multi-platform variant of shlex.split() for command-line splitting.
    For use with subprocess, for argv injection etc. Using fast REGEX.

    platform: 'this' = auto from current platform;
              1 = POSIX; 
              0 = Windows/CMD
              (other values reserved)
    """
    if platform == 'this':
        platform = (sys.platform != 'win32')
    if platform == 1:
        RE_CMD_LEX = r'''"((?:\\["\\]|[^"])*)"|'([^']*)'|(\\.)|(&&?|\|\|?|\d?\>|[<])|([^\s'"\\&|<>]+)|(\s+)|(.)'''
    elif platform == 0:
        RE_CMD_LEX = r'''"((?:""|\\["\\]|[^"])*)"?()|(\\\\(?=\\*")|\\")|(&&?|\|\|?|\d?>|[<])|([^\s"&|<>]+)|(\s+)|(.)'''
    else:
        raise AssertionError('unkown platform %r' % platform)

    args = []
    accu = None   # collects pieces of one arg
    for qs, qss, esc, pipe, word, white, fail in re.findall(RE_CMD_LEX, s):
        if word:
            pass   # most frequent
        elif esc:
            word = esc[1]
        elif white or pipe:
            if accu is not None:
                args.append(accu)
            if pipe:
                args.append(pipe)
            accu = None
            continue
        elif fail:
            raise ValueError("invalid or incomplete shell string")
        elif qs:
            word = qs.replace('\\"', '"').replace('\\\\', '\\')
            if platform == 0:
                word = word.replace('""', '"')
        else:
            word = qss   # may be even empty; must be last

        accu = (accu or '') + word

    if accu is not None:
        args.append(accu)

    return args
+16

Source: https://habr.com/ru/post/1614684/


All Articles