Go from tiddlywiki list to python list

Tiddlywiki uses internally tags, separated by spaces, to create a list of tags. But it uses [[ and ]] to limit verbose tags. That is, the list of foo , ram doo , bar and very cool becomes in tiddlywiki the same line:

 "foo [[ram doo]] bar [[very cool]]" 

How can I convert this to a python list that looks like this:

 ['foo', 'ram doo', 'bar', 'very cool'] 

"foo [[ram doo]] bar".split() does not work for me ..

+5
source share
4 answers

With regex:

 import re a = "foo [[ram doo]] bar [[very cool]] something else" pattern = re.compile(r'\[\[[^\]]+\]\]|[^\[\] ]+') print([i.strip(' []') for i in pattern.findall(a)]) 

Print ['foo', 'ram doo', 'bar', 'very cool', 'something', 'else']

Regex basically “tokenizes” the string (borders are either [[..]] or a space in this order), list comprehension then removes the brackets from the tokens.

+8
source

A simple regex works:

 >>> import re >>> [x.strip() for x in re.split('\[\[|\]\]', "foo [[ram doo]] bar [[very cool]]") if x] ['foo', 'ram doo', 'bar', 'very cool'] 
+3
source

This will work fine. Two string codes without regex:

 >>> s = "foo [[ram doo]] bar [[very cool]]" >>> [x.strip() for x in " ".join(s.replace('[[','*').replace(']]','*').split("*")).split(" ") if x] ['foo', 'ram', 'doo', 'bar', 'very', 'cool'] 
+1
source

You can do it as follows without using re

Obviously using re would be more efficient, this answer should demonstrate that you can do it with split()

[EDITED based on comment]

 my_string = "foo [[ram doo]] bar [[very cool]]" # also works for the following strings #my_string = "foo [[ram doo]] bar [[very cool]] something else" #my_string = "something else" #my_string = "foo bar [[ram doo]]" ##<-- this is the border case #my_string = "[[ram doo]] foo bar" #my_string = "foo [[ram doo]] bar " # set "splitting string" s1 = ']]' s2 = '[[' if my_string[-2::] == ']]' and my_string.count(']]') == 1: # reverse splitting string for border case s1 = '[[' s2 = ']]' # split on s1 only if s1 in string my_list1 = [a if s1 in my_string else my_string for a in my_string.split(s1)] # split each element on s2 or space my_list2 = [x.split(s2) if s2 in x else x.split(' ') for x in my_list1] # flatten lists in lists, and strip spaces my_list3 = [a.strip(' ') for b in my_list2 for a in (b if isinstance(b, list) else [b])] # get rid of empties my_list4 = [a for a in my_list3 if a != ''] print(my_list4) # will output # ['foo', 'ram doo', 'bar', 'very cool'] 

So the conclusion is to use re

0
source

Source: https://habr.com/ru/post/1274659/


All Articles