How to split a string into a list of words in TCL, ignoring multiple spaces?

Question

How to split a string into a list of words in TCL, ignoring multiple spaces?

Basically, I have a line consisting of several words separated by spaces. The fact is that there can be several spaces instead of one separating words. This is why [split] not doing what I want:

 split "ab"

gives me the following:

 {a {} {} {} b}

instead of this:

 {ab}

Searching on Google, I found a page on the wiki where the user asked more or less the same question.

One proposed solution would look like this:

 split [regsub -all {\s+} "ab" " "]

which seems to work for a simple string. But a test string such as [string repeat " " 4] (used line repeat, since StackOverflow separates multiple spaces) will result in regsub ", which split will be split again into {{} {}} instead of an empty list.

Another suggested solution was to force a reinterpretation of the given string as a list:

 lreplace "a list with many spaces" 0 -1

But if I learned something about TCL, you should never use list functions (starting with l ) in strings. Indeed, this one will choke strings containing special characters (namely {and}):

 lreplace "test \{ab\}"

returns test {ab} instead of test \{ab\} (that would be what I want, each word separated by a space is split into one element of the resulting list).

Another solution was to use a "filter":

 proc filter {cond list} { set res {} foreach element $list {if [$cond $element] {lappend res $element}} set res }

Then you will use it as follows:

 filter llength [split "a list with many spaces"]

Again, the same problem. This will call llength in the string, which may contain special characters (again, {and}) - passing it "\ {ab \}" will cause the TCL to complain about the "unsurpassed open parenthesis in the list."

I managed to get it working by changing this filter function, adding {*} before $ cond in if, so I could use it with string length instead of llength , which seemed to work on all the possible input data that I was trying to use at the moment .

Is this solution safe to use as it is now? Will it strangle any special contribution that I have not tested yet? Or is it possible to do this easier than easier?

+4

string split tcl

Jerry Nov 14 '12 at 14:38

source share

1 answer

Donal fellows · Accepted Answer · 2012-11-14T15:49:18+0000

The easiest way is to use regexp -all -inline to select and return all words. For instance:

 # The RE matches any non-empty sequence of non-whitespace characters set theWords [regexp -all -inline {\S+} $theString]

If instead you define words as sequences of alphanumeric characters, instead you use this for regular expression: {\w+}

How to split a string into a list of words in TCL, ignoring multiple spaces?

More articles: