Parsing empty tokens from a string with strtok

My application creates lines like the ones below. I need to parse the values ​​between the separator into separate values.

2342|2sd45|dswer|2342||5523|||3654|Pswt 

I use strtok for this in a loop. For the fifth token, I get 5523. However, I need to consider the empty value between the two delimiters || . 5523 should be the sixth token, as per my requirement.

 token = (char *)strtok(strAccInfo, "|"); for (iLoop=1;iLoop<=106;iLoop++) { token = (char *)strtok(NULL, "|"); } 

Any suggestions?

+5
source share
8 answers

In this case, I often prefer the loop p2 = strchr(p1, '|') with memcpy(s, p1, p2-p1) inside. It is fast, does not destroy the input buffer (therefore, it can be used with const char * ) and is really portable (even built-in).

It is also reentrant; strtok no. (BTW: re-grantor has nothing to do with multi-threaded. strtok breaks already with nested loops. You can use strtok_r , but it is not portable.)

+6
source

When the function is first called, it expects the string C as an argument to str, whose first character is used as the starting location for scanning tokens. In subsequent calls, the function expects a null pointer and uses the position immediately after the end of the last token as a new start location for scanning.

To determine the beginning and end of the token, the function first scans from the starting location for the first character not contained in the divisors (which become the beginning of the token). And then a scan, starting from this beginning of the token for the first character contained in the delimiters that end the token.

What does it say that he will skip any '|' characters at the beginning of the token. By making 5523 the fifth token you already knew. Just thought I'd explain why (I have to figure it out myself). This also means that you will not receive empty tokens.

Since your data is configured this way, you have several possible solutions:
1) find all occurrences || and replace with | | (put a space there)
2) do strstr 5 times and find the beginning of the 5th element.

+2
source

This is a limitation of strtok . Designers had markup separated by spaces. strtok doesn't do much; just roll your own parser. The C FAQ contains an example .

+2
source
 char *mystrtok(char **m,char *s,char c) { char *p=s?s:*m; if( !*p ) return 0; *m=strchr(p,c); if( *m ) *(*m)++=0; else *m=p+strlen(p); return p; } 
  • returnable
  • flow
  • strictly conforms to ANSI
  • an unused reference pointer is required from the call context

eg.

 char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt"; for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|')) puts(t); 

eg.

 char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt"; for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|')) { char *p1,*t1; for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,',')) puts(t1); } 

your work :) implement char * c as parameter 3

+2
source

Look at using strsep instead: strsep link

+1
source

Use something other than strtok . He just was not going to do what you ask. When I need it, I usually used strcspn or strpbrk and processed the rest of the token. If you don't mind changing the input string like strtok , this should be pretty simple. At least right away, something like this seems like it should work:

 // Warning: untested code. Should really use something with a less-ugly interface. char *tokenize(char *input, char const *delim) { static char *current; // just as ugly as strtok! char *pos, *ret; if (input != NULL) current = input; if (current == NULL) return current; ret = current; pos = strpbrk(current, delim); if (pos == NULL) current = NULL; else { *pos = '\0'; current = pos+1; } return ret; } 
+1
source

Below is a solution that works for me now. Thanks to all who responded.

I am using LoadRunner. Consequently, some unfamiliar commands, but I believe that the flow can be understood quite easily.

 char strAccInfo[1024], *p2; int iLoop; Action() { //This value would come from the wrsp call in the actual script. lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param"); //Store the parameter into a string - saves memory. strcpy(strAccInfo,lr_eval_string("{test_Param}")); //Get the first instance of the separator "|" in the string p2 = (char *) strchr(strAccInfo,'|'); //Start a loop - Set the max loop value to more than max expected. for (iLoop = 1;iLoop<200;iLoop++) { //Save parameter names in sequence. lr_param_sprintf("Param_Name","Parameter_%d",iLoop); //Get the first instance of the separator "|" in the string (within the loop). p2 = (char *) strchr(strAccInfo,'|'); //Save the value for the parameters in sequence. lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}")); //Save string after the first instance of p2, as strAccInfo - for looping. strcpy(strAccInfo,p2+1); //Start conditional loop for checking for last value in the string. if (strchr(strAccInfo,'|')==NULL) { lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1); lr_save_string(strAccInfo,lr_eval_string("{Param_Name}")); iLoop = 200; } } } 
0
source

Inspired by the answer of Patrick Schluter I made this function, it should be thread-oriented and support empty tokens and does not change the original string

 char* strTok(char** newString, char* delimiter) { char* string = *newString; char* delimiterFound = (char*) 0; int tokLenght = 0; char* tok = (char*) 0; if(!string) return (char*) 0; delimiterFound = strstr(string, delimiter); if(delimiterFound){ tokLenght = delimiterFound-string; }else{ tokLenght = strlen(string); } tok = malloc(tokLenght + 1); memcpy(tok, string, tokLenght); tok[tokLenght] = '\0'; *newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0; return tok; } 

you can use it as

 char* input = "1,2,3,,5,"; char** inputP = &input; char* tok; while( (tok=strTok(inputP, ",")) ){ printf("%s\n", tok); } 

This is valid for output.

 1 2 3 5 

I tested it on simple strings, but haven't used it in production yet, and posted it also in the code review so you can see what others think about it

0
source

Source: https://habr.com/ru/post/904949/


All Articles