Trying to parse the following text file:
prefix1 prefix2 name1( type1 name1, type2 name2 );
with the following regular expression:
\\s*prefix1\\s*prefix2\\s*(\\w[\\w\\d_]*).*\\(\\s*([^\\)]*\\))\\s*;\\s*
as a result, I get the following two groups (registers):
"name1( "
and
"( type1 name1, type2 name2 )"
(here are the string restrictions, \ n are included)
I cannot understand why the first group (\w[\w\d_]*) corresponds to the following part .* . Moreover, I can not get rid of the unnecessary tail!
What's my mistake?
ADD: Regular expression parsed:
(cl-ppcre::parse-string "\\s*prefix1\\s*prefix2\\s*(\\w[\\w\\d_]*).*\\(\\s*([^\\)]*\\))\\s*;\\s*") (:SEQUENCE (:GREEDY-REPETITION 0 NIL :WHITESPACE-CHAR-CLASS) "prefix1" (:GREEDY-REPETITION 0 NIL :WHITESPACE-CHAR-CLASS) "prefix2" (:GREEDY-REPETITION 0 NIL :WHITESPACE-CHAR-CLASS) (:REGISTER (:SEQUENCE :WORD-CHAR-CLASS (:GREEDY-REPETITION 0 NIL (:CHAR-CLASS :WORD-CHAR-CLASS :DIGIT-CLASS
ADD 2: Full source:
;; Requirements: ;; cl-ppcre (defparameter *name-and-parameters-list* (cl-ppcre::create-scanner "\\s*prefix1\\s*prefix2\\s*(\\w[\\w\\d_]*)\\s*\\(\\s*([^\\)]*\\))\\s*;\\s*")) (defparameter *filename* "c:/pva/home/test.txt") (defun read-txt-without-comments (file-name) "Would epically fail in case the file format changes, because currently it expects the \"/*\" and \"*/\" sequences to be on the separate line." (let ((fstr (make-array '(0) :element-type 'base-char :fill-pointer 0 :adjustable t))) (with-output-to-string (s fstr) (let ((comment nil)) (with-open-file (input-stream file-name :direction :input) (do ((line (read-line input-stream nil 'eof) (read-line input-stream nil 'eof))) ((eql line 'eof)) (multiple-value-bind (start-comment-from) (cl-ppcre:scan ".*/\\*" line) (multiple-value-bind (end-comment-from) (cl-ppcre:scan ".*\\*/" line) (if start-comment-from (setf comment t)) (if (not comment) (format s "~A~%" line)) (if end-comment-from (setf comment nil)))))))) fstr)) (let* ((string (read-txt-without-comments "c:/pva/home/test.txt"))) (multiple-value-bind (abcd) (cl-ppcre::scan *name-and-parameters-list* string) (format t "~a ~a ~a ~a~%|~a|~%|~a|~%" abcd (subseq string (svref c 0) (svref c 1)) (subseq string (svref d 0) (svref d 1)))))
ADD 3: Full input:
prefix1 prefix2 name1( type1 name1, type2 name2 ); prefix1 prefix2 name2( type3 name1, type2 name2 );
source share