Why is split ('') trying to be too smart?
I just discovered the following odd behavior with String#split :
"a\tb c\nd".split => ["a", "b", "c", "d"] "a\tb c\nd".split(' ') => ["a", "b", "c", "d"] "a\tb c\nd".split(/ /) => ["a\tb", "c\nd"] The source (string.c from 2.0.0) has a length of more than 200 lines and contains the following fragment:
/* L 5909 */ else if (rb_enc_asciicompat(enc2) == 1) { if (RSTRING_LEN(spat) == 1 && RSTRING_PTR(spat)[0] == ' '){ split_type = awk; } } Later, in the code for the awk split type, the actual argument is not yet used and does the same as regular split .
- Does anyone else feel like it somehow broke?
- Are there any good reasons for this?
- βMagic,β as it happens more often than most people might think in Ruby?
This is consistent with the behavior of Perl split() . This, in turn, is based on Gnu awk split() . So this is a long tradition with origins in Unix.
From perldoc to split :
As another special case, splitting emulates the default behavior for the awk command-line tool when PATTERN is either omitted or literally a string consisting of a single space character (for example, '' or '\ x20 ", but not, for example, //). In this case, any leading spaces in EXPR are removed before splitting, and PATTERN is instead treated as if it were / \ s + /; in particular, this means that any adjacent spaces (and not just one space) are used as a delimiter. special treatment can be avoided by specifying pattern // instead of p oki "", thereby allowing only one space character is a delimiter.
Check out the documentation , in particular this part:
If the pattern is a string, then its contents are used as a delimiter when splitting str. If the pattern is one space, str is divided by a space, with leading spaces and spaces of adjacent spaces characters are ignored.
If the pattern is omitted, the value of $; is used. If $; nil (which is the default), str breaks into spaces, as if `` were specified.
You can use regular expression to split the string.