Retrieving key value pairs from quoted string

I am having trouble coding an "elegant" parser for this requirement. (One that doesn't look like a piece of C breakfast). The input is a string, pairs of key values ​​separated by a "," and connected to "=".

key1=value1,key2=value2

The part that deceives me is the values ​​that can be specified ("), and inside the quotation marks ',' the key does not end.

key1=value1,key2="value2,still_value2"

In this last part, it was difficult for me to use split or re.split, resorting to me in the range for loops: (.

Can anyone demonstrate a clean way to do this?

It is okay for quotation marks to be executed only in values, and that there are no whitespace or non-alphanumeric characters.

+4
source share
5 answers

I would recommend using regular expressions for this task, because the language you want to parse is not regular.

You have a character string of several key value pairs. The best way to analyze this is not to match patterns on it, but to label it correctly.

The Python standard library has a module called shlexthat imitates the parsing performed by POSIX shells and provides a lexer implementation that can be easily adapted to your needs.

from shlex import shlex

def parse_kv_pairs(text, item_sep=",", value_sep="="):
    """Parse key-value pairs from a shell-like text."""
    # initialize a lexer, in POSIX mode (to properly handle escaping)
    lexer = shlex(text, posix=True)
    # set ',' as whitespace for the lexer
    # (the lexer will use this character to separate words)
    lexer.whitespace = item_sep
    # include '=' as a word character 
    # (this is done so that the lexer returns a list of key-value pairs)
    # (if your option key or value contains any unquoted special character, you will need to add it here)
    lexer.wordchars += value_sep
    # then we separate option keys and values to build the resulting dictionary
    # (maxsplit is required to make sure that '=' in value will not be a problem)
    return dict(word.split(value_sep, maxsplit=1) for word in lexer)

Execution Example:

parse_kv_pairs(
  'key1=value1,key2=\'value2,still_value2,not_key1="not_value1"\''
)

Output:

{'key1': 'value1', 'key2': 'value2,still_value2,not_key1="not_value1"'}

EDIT: , , shlex, ( ) , , . , - , (: A="B=\"1,2,3\""), .

(, -, , ), .

EDIT2: split maxsplit, , //. @cdlane !

+3

, , :

import re

string = 'key1=value1,key2="value2,still_value2"'

key_value_pairs = re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', string)

for key_value_pair in key_value_pairs:
    key, value = key_value_pair.split("=")

Per BioGeek, , Janne Karila: , ( ). : , ; , , ( ):

(?:              # parenthesis for alternation (|), not memory
[^\s,"]          # any 1 character except white space, comma or quote
|                # or
"(?:\\.|[^"])*"  # a quoted string containing 0 or more characters
                 # other than quotes (unless escaped)
)+               # one or more of the above
+5

:

import re
match = re.findall(r'([^=]+)=(("[^"]+")|([^,]+)),?', 'key1=value1,key2=value2,key3="value3,stillvalue3",key4=value4')

"match":

[('key1', 'value1', '', 'value1'), ('key2', 'value2', '', 'value2'), ('key3', '"value3,stillvalue3"', '"value3,stillvalue3"', ''), ('key4', 'value4', '', 'value4')]

for :

for m in match:
    key = m[0]
    value = m[1]
+3

I'm not sure that it does not look like a piece of breakfast and that it is rather elegant :)

data = {}
original = 'key1=value1,key2="value2,still_value2"'
converted = ''

is_open = False
for c in original:
    if c == ',' and not is_open:
        c = '\n'
    elif c in ('"',"'"):
        is_open = not is_open
    converted += c

for item in converted.split('\n'):
    k, v = item.split('=')
    data[k] = v
+2
source

Based on several other answers, I came up with the following solution:

import re
import itertools

data = 'key1=value1,key2="value2,still_value2"'

# Based on Alan Moore answer on http://stackoverflow.com/questions/2785755/how-to-split-but-ignore-separators-in-quoted-strings-in-python
def split_on_non_quoted_equals(string):
    return re.split('''=(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''', string)
def split_on_non_quoted_comma(string):
    return re.split(''',(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''', string)

split1 = split_on_non_quoted_equals(data)
split2 = map(lambda x: split_on_non_quoted_comma(x), split1)

# 'Unpack' the sublists in to a single list. Based on Alex Martelli answer on http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
flattened = [item for sublist in split2 for item in sublist]

# Convert alternating elements of a list into keys and values of a dictionary. Based on Sven Marnach answer on http://stackoverflow.com/questions/6900955/python-convert-list-to-dictionary
d = dict(itertools.izip_longest(*[iter(flattened)] * 2, fillvalue=""))

The result dis the following dictionary:

{'key1': 'value1', 'key2': '"value2,still_value2"'}
+1
source

Source: https://habr.com/ru/post/1650003/


All Articles