["a", "b", "c", ...">

Why is split ('') trying to be too smart?

I just discovered the following odd behavior with String#split :

 "a\tb c\nd".split => ["a", "b", "c", "d"] "a\tb c\nd".split(' ') => ["a", "b", "c", "d"] "a\tb c\nd".split(/ /) => ["a\tb", "c\nd"] 

The source (string.c from 2.0.0) has a length of more than 200 lines and contains the following fragment:

 /* L 5909 */ else if (rb_enc_asciicompat(enc2) == 1) { if (RSTRING_LEN(spat) == 1 && RSTRING_PTR(spat)[0] == ' '){ split_type = awk; } } 

Later, in the code for the awk split type, the actual argument is not yet used and does the same as regular split .

  • Does anyone else feel like it somehow broke?
  • Are there any good reasons for this?
  • β€œMagic,” as it happens more often than most people might think in Ruby?
+6
source share
2 answers

This is consistent with the behavior of Perl split() . This, in turn, is based on Gnu awk split() . So this is a long tradition with origins in Unix.

From perldoc to split :

As another special case, splitting emulates the default behavior for the awk command-line tool when PATTERN is either omitted or literally a string consisting of a single space character (for example, '' or '\ x20 ", but not, for example, //). In this case, any leading spaces in EXPR are removed before splitting, and PATTERN is instead treated as if it were / \ s + /; in particular, this means that any adjacent spaces (and not just one space) are used as a delimiter. special treatment can be avoided by specifying pattern // instead of p oki "", thereby allowing only one space character is a delimiter.

+4
source

Check out the documentation , in particular this part:

If the pattern is a string, then its contents are used as a delimiter when splitting str. If the pattern is one space, str is divided by a space, with leading spaces and spaces of adjacent spaces characters are ignored.

If the pattern is omitted, the value of $; is used. If $; nil (which is the default), str breaks into spaces, as if `` were specified.

You can use regular expression to split the string.

+2
source

Source: https://habr.com/ru/post/943972/


All Articles