Trying to understand strtok

Question

Trying to understand strtok

Consider the following snippet that strtok uses to split the string madddy.

char* str = (char*) malloc(sizeof("Madddy")); strcpy(str,"Madddy"); char* tmp = strtok(str,"d"); std::cout<<tmp; do { std::cout<<tmp; tmp=strtok(NULL, "dddy"); }while(tmp!=NULL);

It works great, the output is Ma. But by changing strtok to the following,

 tmp=strtok(NULL, "ay");

The output will be Madd. How does strtok work? I have this question because I expected strtok to take every character that is in the separator string that will be used as the separator. But in some cases this is done in this way, but in some cases it gives unexpected results. Can someone help me figure this out?

+4

c ++ strtok

Lavanya narayanaswamy Jan 14 '11 at 2:26

source share

6 answers

"Trying to understand strtok" Good luck!

In any case, we are in 2011. Tokeniz is correct:

 std::string str("abc:def"); char split_char = ':'; std::istringstream split(str); std::vector<std::string> token; for (std::string each; std::getline(split, each, split_char); token.push_back(each));

: D

+10

Lightness races in orbit Jan 14 '11 at 2:34

source share

Fred Flintstone probably used strtok() . It precedes multi-threaded environments and surpasses (modifies) the original string.

When called with NULL for the first parameter, it continues parsing the last line. This feature was convenient, but a bit unusual even at the time.

+3

wallyk Jan 14 '11 at 2:37

source share

Actually your code is wrong, it is not surprising that you get unexpected results:

 char* str = (char*) malloc(sizeof("Madddy"));

it should be

 char* str = (char*) malloc(strlen("Madddy") + 1);

+2

Anders K. Jan 14 '11 at 2:40

source share

I asked a question inspired by another question about functions causing security problems / bad practice functions and the standard library c .

To quote the answer from me:

A common mistake with the strtok() function is to assume that the parsed string remains unchanged, and in fact replaces the delimiter character with '\0' .
In addition, strtok() used by making subsequent calls to it while the whole string is tokenized. Some library storages strtok() internal status in a global variable that can cause unpleasant surprises if strtok() is called from multiple threads at the same time.

As you noted your C ++ question, use something else! If you want to use C, I would suggest implementing your own tokenizer, which works in a safe manner.

0

user257111 Jan 14 '11 at 2:48

source share

Since you changed your tag to C, not C ++, I rewrote your function to use printf so you can see what is happening. Hoang is right. You see the correct result, but I think that you are printing everything on one line, so you are confused about the output. Look at Hoang's answer, explaining what is going on right. Also, as others have noted, strtok kills the input string, so you have to be careful about this - and it is not thread safe. But if you need a quick dirty tokenizer, it works. In addition, I changed the code to use strlen correctly and not sizeof as Anders correctly pointed out.

Here your code is modified to be more like C:

 char* str = (char*) malloc(strlen("Madddy") + 1); strcpy(str,"Madddy"); char* tmp = strtok(str,"d"); printf ("first token: %s\n", tmp); do { tmp=strtok(NULL, "ay"); if (tmp != NULL ) { printf ("next token: %s\n", tmp); } } while(tmp != NULL);

0

Mark Jan 14 '11 at 19:52

source share

Hoàng long · Accepted Answer · 2011-01-14T03:02:47+0000

It seems you forget that you call strtok for the first time (outside the loop) with the delimiter "d".

Strtok is working fine. You should have a link here .

For the second example ( strtok("ay") ):

First you call strtok (str, "d"). It will search for the first "d" and split your string. In particular, it sets tmp = "Ma", and str = "ddy" (discards the first "d").

Then you call strtok (str, "ay"). It will look for “a” in str, but since your string is now only “ddy”, no matching occurs. Then he will search for "y". So str = "dd" and tmp = "".

He prints "Madd," as you saw.

Trying to understand strtok

More articles: