How to split a string into a list of words in TCL, ignoring multiple spaces?

Basically, I have a line consisting of several words separated by spaces. The fact is that there can be several spaces instead of one separating words. This is why [split] not doing what I want:

 split "ab" 

gives me the following:

 {a {} {} {} b} 

instead of this:

 {ab} 

Searching on Google, I found a page on the wiki where the user asked more or less the same question.

One proposed solution would look like this:

 split [regsub -all {\s+} "ab" " "] 

which seems to work for a simple string. But a test string such as [string repeat " " 4] (used line repeat, since StackOverflow separates multiple spaces) will result in regsub ", which split will be split again into {{} {}} instead of an empty list.

Another suggested solution was to force a reinterpretation of the given string as a list:

 lreplace "a list with many spaces" 0 -1 

But if I learned something about TCL, you should never use list functions (starting with l ) in strings. Indeed, this one will choke strings containing special characters (namely {and}):

 lreplace "test \{ab\}" 

returns test {ab} instead of test \{ab\} (that would be what I want, each word separated by a space is split into one element of the resulting list).

Another solution was to use a "filter":

 proc filter {cond list} { set res {} foreach element $list {if [$cond $element] {lappend res $element}} set res } 

Then you will use it as follows:

 filter llength [split "a list with many spaces"] 

Again, the same problem. This will call llength in the string, which may contain special characters (again, {and}) - passing it "\ {ab \}" will cause the TCL to complain about the "unsurpassed open parenthesis in the list."

I managed to get it working by changing this filter function, adding {*} before $ cond in if, so I could use it with string length instead of llength , which seemed to work on all the possible input data that I was trying to use at the moment .

Is this solution safe to use as it is now? Will it strangle any special contribution that I have not tested yet? Or is it possible to do this easier than easier?

+4
source share
1 answer

The easiest way is to use regexp -all -inline to select and return all words. For instance:

 # The RE matches any non-empty sequence of non-whitespace characters set theWords [regexp -all -inline {\S+} $theString] 

If instead you define words as sequences of alphanumeric characters, instead you use this for regular expression: {\w+}

+14
source

Source: https://habr.com/ru/post/1446014/


All Articles