Python splits string into quotes

I am a python student. If I have lines of text in a file that looks like

"Y: \ DATA \ 00001 \ SERVER \ DATA.TXT" "V: \ DATA2 \ 00002 \ SERVER2 \ DATA2.TXT"

Is it possible to split lines around inverted commas? The only constant would be their position in the file relative to the data lines themselves. Data lines can vary from 10 to 100 + characters (they will be sub-network folders). I don’t see how I can use any other way so that these markers can separate, but my lack of knowledge in python makes this difficult. I tried

optfile=line.split("") 

and other options, but keep getting the value error: empty seperator. I understand why it says that, I just don’t know how to change it. Any help, as always, is greatly appreciated.

Thank you very much

+6
source share
9 answers

Searching for all regular expression matches will do this:

 input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"' re.findall('".+?"', # or '"[^"]+"', input) 

This will return a list of file names:

 ["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"] 

To get the file name without quotes, use:

 [f[1:-1] for f in re.findall('".+?"', input)] 

or use re.finditer :

 [f.group(1) for f in re.finditer('"(.+?)"', input)] 
+3
source

You must exit from:

 input.split("\"") 

leads to

 ['\n', 'Y:\\DATA\x0001\\SERVER\\DATA.TXT', ' ', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT', '\n'] 

To remove blank lines:

 [line for line in [line.strip() for line in input.split("\"")] if line] 

leads to

 ['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT'] 
+8
source

No regex, no separation, just use csv.reader

 import csv sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -' def main(): for l in csv.reader([sample_line], delimiter=' ', quotechar='"'): print l 

Output signal

 ['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-'] 
+4
source

I will simply add that if you were dealing with strings that look as if they could be command line parameters, then you could use the shlex module :

 import shlex with open('somefile') as fin: for line in fin: print shlex.split(line) 

Would give:

 ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT'] 
+3
source

shlex can help you.

 import shlex my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"' shlex.split(my_string) 

It will spit

 ['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT'] 

Link: https://docs.python.org/2/library/shlex.html

+1
source

I think you want to extract files that are separated by spaces. That is, you want to split the string into elements contained in quotes. I. with a line

 "FILE PATH" "FILE PATH 2" 

Do you want to

 ["FILE PATH","FILE PATH 2"] 

In this case:

 import re with open('file.txt') as f: for line in f: print(re.split(r'(?<=")\s(?=")',line)) 

With file.txt :

 "Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT" 

Outputs:

 >>> ['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"'] 
0
source

That was my decision. It analyzes the most normal input in the same way as if it were passed directly to the command line.

 import re def simpleParse(input_): def reduce_(quotes): return '' if quotes.group(0) == '"' else '"' rex = r'("[^"]*"(?:\s|$)|[^\s]+)' return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)] 

Use case: collecting a bunch of single-shot scripts into a launch utility without having to re-enter the command.

Edit: Got an OCD about the stupid way the command line handles crappy quotes and wrote below:

 import re tokens = list() reading = False qc = 0 lq = 0 begin = 0 for z in range(len(trial)): char = trial[z] if re.match(r'[^\s]', char): if not reading: reading = True begin = z if re.match(r'"', char): begin = z qc = 1 else: begin = z - 1 qc = 0 lc = begin else: if re.match(r'"', char): qc = qc + 1 lq = z elif reading and qc % 2 == 0: reading = False if lq == z - 1: tokens.append(trial[begin + 1: z - 1]) else: tokens.append(trial[begin + 1: z]) if reading: tokens.append(trial[begin + 1: len(trial) ]) tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens] 
0
source

I know this was answered a million years ago, but this also works:

 input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"' input = input.replace('" "','"').split('"')[1:-1] 

Should output it as a list containing:

 ['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT'] 
0
source

My Python question is - An error caused by space in argv Arument has been flagged as a duplicate of this. We have several Python books that return to Python 2.3. The oldest was mentioned using a list for argv, but without an example, so I changed things to: -

 repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath] SWCore.main(repoCmd) 

and in SWCore: -

 sys.argv = args 

The shlex module works, but I prefer that.

-1
source

Source: https://habr.com/ru/post/945229/


All Articles