I use python to view the file and remove any comments. A comment is defined as a hash and everything to the right of it if the hash is not inside double quotes. I currently have a solution, but it seems suboptimal:
filelines = [] r = re.compile('(".*?")') for line in f: m = r.split(line) nline = '' for token in m: if token.find('#') != -1 and token[0] != '"': nline += token[:token.find('#')] break else: nline += token filelines.append(nline)
Is there a way to find the first hash not inside quotes without loops (i.e. through regular expressions?)
Examples:
' "Phone #":"555-1234" ' -> ' "Phone #":"555-1234" ' ' "Phone "#:"555-1234" ' -> ' "Phone "' '#"Phone #":"555-1234" ' -> '' ' "Phone #":"555-1234" #Comment' -> ' "Phone #":"555-1234" '
Edit: Here is a clean regular solution created by user2357112. I tested it and it works great:
filelines = [] r = re.compile('(?:"[^"]*"|[^"#])*(#)') for line in f: m = r.match(line) if m != None: filelines.append(line[:m.start(1)]) else: filelines.append(line)
See his answer for more details on how this regular expression works.
Edit2: Here is the version of user2357112 that I modified to account for escape characters (\). This code also excludes "if", including checking the end of the line ($):
filelines = [] r = re.compile(r'(?:"(?:[^"\\]|\\.)*"|[^"#])*(#|$)') for line in f: m = r.match(line) filelines.append(line[:m.start(1)])
source share