You need to know when there is no data between two token separators using strtok ()

I'm trying to label a string, but I need to know exactly when data is not visible between the two tokens. for example, when subscribing to the next line " a,b,c,,,d,e " I need to know about two empty slots between " d " and " e " ... which I cannot find, simply using strtok() . My attempt is shown below:

 char arr_fields[num_of_fields]; char delim[]=",\n"; char *tok; tok=strtok(line,delim);//line contains the data for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim)) { if(tok) sprintf(arr_fields[i], "%s", tok); else sprintf(arr_fields[i], "%s", "-"); } 

When executing the above code with the above examples, put the characters a, b, c, d, e in the first five elements of arr_fields , which is undesirable. I need the position of each character in order to go to the specific indexes of the array: ie, if the character is missing between two characters, it should be written as is.

+6
source share
6 answers

7.21.5.8 strtok function

The standard says strtok :

[# 3] The first call in the sequence looks for a string pointed to by s1 for the first character that is not contained in the current line of the separator pointed to by s2 . If such a symbol is not found, then there are no tokens in it. the string pointed to by s1 , and the strtok returns a null pointer. If such a symbol is found, this is the beginning of the first token.

In the above quote, we can read that you cannot use strtok as a solution to your specific problem, as it will treat any consecutive characters found in delims as a single token.


Am I doomed to cry in silence, or can someone help me?

You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.

strtok_single uses strpbrk (char const* src, const char* delims) , which will return a pointer to the first occurrence of any character in the dividends that is in the line with the terminating src zero.

If no matching character is found, the function will return NULL.


strtok_single

 char * strtok_single (char * str, char const * delims) { static char * src = NULL; char * p, * ret = 0; if (str != NULL) src = str; if (src == NULL) return NULL; if ((p = strpbrk (src, delims)) != NULL) { *p = 0; ret = src; src = ++p; } else if (*src) { ret = src; src = NULL; } return ret; } 

use of example

  char delims[] = ","; char data [] = "foo,bar,,baz,biz"; char * p = strtok_single (data, delims); while (p) { printf ("%s\n", *p ? p : "<empty>"); p = strtok_single (NULL, delims); } 

Exit

 foo bar <empty> baz biz 
+12
source

You cannot use strtok() if you want to. On the man page:

A sequence of two or more adjacent separator characters in the analyzed string is considered the only separator. The delimiter characters at the beginning or end of a line are ignored. In other words: the tokens returned by strtok () are always non-empty strings.

Therefore, in your example, it will jump from c to d .

You will have to parse the string manually, or perhaps look for a CSV parsing library that will make your life easier.

+2
source

Recently, I have been looking for a solution to this problem and found this topic.

You can use strsep() . From the manual:

The strsep () function was introduced as a replacement for strtok (3), since the latter cannot handle empty fields.

+2
source

As mentioned in this answer , you will want to implement something like strtok yourself. I prefer to use strcspn (unlike strpbrk ), as this allows fewer if :

 char arr_fields[num_of_fields]; char delim[]=",\n"; char *tok; int current_token= 0; int token_length; for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim)) { if(token_length) sprintf(arr_fields[i], "%.*s", token_length, line + current_token); else sprintf(arr_fields[i], "%s", "-"); current_token += token_length; } 
+1
source
  • Analysis (e.g. strtok)
  • Sorting
  • Embed
  • Rinse and repeat as needed :)
0
source

You can try using strchr to find out the location of characters strchr Manually toxicize your string to the token you found (using memcpy or strncpy ), and then use strchr again. You can see if two or more commas are next to each other in this way (strchr will return numbers so that their subtraction will be 1), and you can write an if to handle this case.

0
source

Source: https://habr.com/ru/post/904944/


All Articles