:
:
, , - , "pos_flag", :
import re
pattern='\w+(?=\[edit\])'
track=[]
with open('mon.txt','r') as f:
for line in f:
match=re.search(pattern,line)
if match:
track.append('pos_flag')
track.append(line.strip().split('[')[0])
else:
track.append(line.strip().split('(')[0])
- :
['pos_flag', 'Alabama', 'Auburn ', 'Florence ', 'Jacksonville ', 'Livingston ', 'Montevallo ', 'Troy ', 'Tuscaloosa ', 'Tuskegee ', 'pos_flag', 'Alaska', 'Fairbanks ', 'pos_flag', 'Arizona', 'Flagstaff ', 'Tempe ', 'Tucson ', 'pos_flag', 'Arkansas', 'Arkadelphia ', 'Conway ', 'Fayetteville ', 'Jonesboro ', 'Magnolia ', 'Monticello ', 'Russellville ', 'Searcy ', 'pos_flag',
, "pos_flag" :
:
pos_flag :
index_no=[]
for index,value in enumerate(track):
if value=='pos_flag':
index_no.append(index)
:
[0, 10, 13, 18, 28, 55, 66, 75, 79, 93, 111, 114, 119, 131, 146, 161, 169, 182, 192, 203, 215, 236, 258, 274, 281, 292, 297, 306, 310, 319, 331, 338, 371, 391, 395, 419, 432, 444, 489, 493, 506, 512, 527, 551, 559, 567, 581, 588, 599, 614]
no, :
:
sort the list using the no index and set the first word as the dict key and the rest as dict values:
city_dict={}
for i in range(0,len(index_no),1):
try:
value_1=track[index_no[i:i + 2][0]:index_no[i:i + 2][1]]
city_dict[value_1[1]]=value_1[2:]
except IndexError:
city_dict[track[index_no[i:i + 2][0]:][1]]=track[index_no[i:i + 2][0]:][1:]
print(city_dict)
output:
since the dict is not ordered in python 3.5, so the output order is different from the input file:
{'Kentucky': ['Bowling Green ', 'Columbia ', 'Georgetown ', 'Highland Heights ', 'Lexington ', 'Louisville ', 'Morehead ', 'Murray ', 'Richmond ', 'Williamsburg ', 'Wilmore '], 'Mississippi': ['Cleveland ', 'Hattiesburg ', 'Itta Bena ', 'Oxford ', 'Starkville '], 'Wisconsin': ['Appleton ', 'Eau Claire ', 'Green Bay ', 'La Crosse ', 'Madison ', 'Menomonie ', 'Milwaukee ',
full_code:
import re
pattern='\w+(?=\[edit\])'
track=[]
with open('mon.txt','r') as f:
for line in f:
match=re.search(pattern,line)
if match:
track.append('pos_flag')
track.append(line.strip().split('[')[0])
else:
track.append(line.strip().split('(')[0])
index_no=[]
for index,value in enumerate(track):
if value=='pos_flag':
index_no.append(index)
city_dict={}
for i in range(0,len(index_no),1):
try:
value_1=track[index_no[i:i + 2][0]:index_no[i:i + 2][1]]
city_dict[value_1[1]]=value_1[2:]
except IndexError:
city_dict[track[index_no[i:i + 2][0]:][1]]=track[index_no[i:i + 2][0]:][1:]
print(city_dict)
The second solution:
If you want to use regex, try this small solution:
import re
pattern='((\w+\[edit\])(?:(?!^\w+\[edit\]).)*)'
with open('file.txt','r') as f:
prt=re.finditer(pattern,f.read(),re.DOTALL | re.MULTILINE)
for line in prt:
dict_p={}
match = []
match.append(line.group(1))
dict_p[match[0].split('\n')[0].strip().split('[')[0]]= [i.split('(')[0].strip() for i in match[0].split('\n')[1:][:-1]]
print(dict_p)
he will give:
{'Alabama': ['Auburn', 'Florence', 'Jacksonville', 'Livingston', 'Montevallo', 'Troy', 'Tuscaloosa', 'Tuskegee']}
{'Alaska': ['Fairbanks']}
{'Arizona': ['Flagstaff', 'Tempe', 'Tucson']}
{'Arkansas': ['Arkadelphia', 'Conway', 'Fayetteville', 'Jonesboro', 'Magnolia', 'Monticello', 'Russellville', 'Searcy']}
{'California': ['Angwin', 'Arcata', 'Berkeley', 'Chico', 'Claremont', 'Cotati', 'Davis', 'Irvine', 'Isla Vista', 'University Park, Los Angeles', 'Merced', 'Orange', 'Palo Alto', 'Pomona', 'Redlands', 'Riverside', 'Sacramento', 'University District, San Bernardino', 'San Diego', 'San Luis Obispo', 'Santa Barbara', 'Santa Cruz', 'Turlock', 'Westwood, Los Angeles', 'Whittier']}
{'Colorado': ['Alamosa', 'Boulder', 'Durango', 'Fort Collins', 'Golden', 'Grand Junction', 'Greeley', 'Gunnison', 'Pueblo, Colorado']}
demo: