The most reliable separator character

Update

If you were forced to use one char for the split method, which char would be the most reliable?

Reliability Definition: A delimiter character that is not part of separate substrings separated.

+44
string
Dec 10 '09 at 9:40
source share
11 answers

We are currently using

public const char Separator = ((char)007); 

I think it is a beep if I am not mistaken.

+48
Dec 10 '09 at 9:48
source share

Besides 0x0, which may not be available (due to strings with a terminating zero, for example), ASCII control characters between 0x1 and 0x1f are good candidates. The ASCII characters 0x1c-0x1f are even designed for such a thing and have the names "File Separator", "Group Separator", "Record Separator", "Device Separator". However, they are not allowed in transport formats such as XML .

In this case, characters from Unicode personal code points can be used.

One of the last options is to use an escaping strategy, so that the separation character can be entered in some way. However, this complicates the task quite a lot, and you can no longer use String.Split.

+19
Dec 10 '09 at 9:59
source share

You can safely use any character you like as a separator if you avoid the string so that you know that it does not contain that character.

Let, for example, select the character "a" as a separator. (I intentionally chose a regular character to show that any character can be used.)

Use the b character as an escape code. We replace any occurrence of "a" with "b1" and any occurrence of "b" in "b2":

 private static string Escape(string s) { return s.Replace("b", "b2").Replace("a", "b1"); } 

Now the line does not contain the characters 'a', so you can put several of these lines together:

 string msg = Escape("banana") + "a" + Escape("aardvark") + "a" + Escape("bark"); 

The line now looks like this:

 b2b1nb1nb1ab1b1rdvb1rkab2b1rk 

Now you can split the string into "a" and get the individual parts:

 b2b1nb1nb1 b1b1rdvb1rk b2b1rk 

To decode the parts you replace back:

 private static string Unescape(string s) { return s.Replace("b1", "a").Replace("b2", "b"); } 

So, line separation and unencoding parts are done as follows:

 string[] parts = msg.split('a'); for (int i = 0; i < parts.length; i++) { parts[i] = Unescape(parts[i]); } 

Or using LINQ:

 string[] parts = msg.Split('a').Select<string,string>(Unescape).ToArray(); 

If you choose a less general character as a separator, there are, of course, fewer cases that will be escaped. The fact is that this method ensures that the character is safe to use as a delimiter without any assumptions about what characters exist in the data that you want to put in the string.

+15
Dec 10 '09 at 10:32
source share

I usually prefer the symbol | 'as a symbol of separation. If you are not sure what the user is entering into the text, you can prevent the user from entering some special characters, and you can choose a separator character from these characters.

+8
Dec 10 '09 at 9:44
source share

\ 0 is a good delimiter character. It is quite difficult (impossible?) To enter from the keyboard, and it is logical.

\ n is another good candidate in some contexts.

And, of course, .Net strings are unicode, you don't have to limit yourself to the first 255. You can always use a rare Mongolian letter or some kind of reserved or unused Unicode character.

+5
Dec 10 '09 at 9:48
source share

It depends on what you crack.

In most cases, it is best to use split characters that are commonly used, for example

value, value, value

value | value | value

key = value; key = value;

key: value; key: value;

You can use quoted identifiers with commas:

"value", "value", "value with, inside", "value"

I use first,, then | , then if I can’t use any of them, I use split-break char Β§

Please note that you can enter any ASCII char using ALT+number (only on the numeric keypad), so Β§ is ALT+21

+5
Dec 10 '09 at 9:57
source share

There are overloads in String.Split that take line breaks ...

+4
Dec 10 '09 at 9:42
source share

I would say that it depends entirely on the situation; if you are writing a simple TCP / IP chat system, you obviously should not use "\ n" as a split. But "\ 0" is a good character to use because users can never use it!

+2
Dec 10 '09 at 9:51
source share

First of all, in C # (or .NET) you can use multiple separated characters in a single splitting operation.

String.Split Method (Char[]) Link here
An array of Unicode characters that limit the substrings in this instance, an empty array that does not contain separators, or a null reference (Nothing in Visual Basic).

In my opinion, there is no POWERFUL reliable separation character, however, some of them are more suitable than others.

Popular separator characters, such as tab, comma, pipe, are good for viewing an undivided line / line.

If it is only for storage / handling, safer characters are probably those that are rarely used or that are not easily entered from the keyboard.

It also depends on the context of use. For example. If you expect the data to contain email addresses, "@" does not.

Let's say we had to choose one of the ASCII set . There are a number to choose from. For example. "," ^ "and some non-printable characters. However, be careful with some characters, but not all of them are suitable. For example. 0x00 may have an adverse effect on some systems.

+2
Dec 10 '09 at 10:03
source share

It depends on the context in which it is used. If you are talking about a very general demarcation symbol, I don’t think there is an answer to one size.

I find that the null ASCII character '\ 0' is often a good candidate, or you can go with the idea of ​​nitzmahone and use more than one character, then it can be as crazy as you want.

Alternatively, you can parse input and avoid any instances of your separator character.

+1
Dec 10 '09 at 9:46
source share

"|" the pipe sign is mainly used when you pass arguments .. to a method that takes only a string type parameter. This is widely used in SQL Server SP, where you need to pass an array as a parameter. It basically depends on the situation you need.

0
Dec 10 '09 at 10:20
source share



All Articles