Reading a file in C

I have an input file from which I need to extract words. Words can only contain letters and numbers, so everything else will be considered as a separator. I tried fscanf, fgets + sscanf and strtok but nothing works.

while(!feof(file)) { fscanf(file,"%s",string); printf("%s\n",string); } 

Above one obviously does not work because it does not use delimiters, so I replaced the line like this:

  fscanf(file,"%[Az]",string); 

It reads the first word in order, but the file pointer continues to rewind so that it reads the first word again and again.

So, I used fgets to read the first line and used sscanf:

 sscanf(line,"%[Az]%n,word,len); line+=len; 

This does not work either because I try, I cannot move the pointer to the right place. I tried strtok but i cant find how to set delimiters

 while(p != NULL) { printf("%s\n", p); p = strtok(NULL, " "); 

This one obviously takes an empty character as a separator, but I have literally 100 separator units.

I missed something because extracting words from a file seemed like a simple concept at first, but does nothing I try really work?

+4
source share
4 answers

I would use:

 FILE *file; char string[200]; while(fscanf(file, "%*[^A-Za-z]"), fscanf(file, "%199[a-zA-Z]", string) > 0) { /* do something with string... */ } 

This skips non-letters and then reads a string up to 199 letters long. The only oddity is that if you have "words" longer than 199 letters, they will be divided into several words, but you need a restriction to avoid buffer overflows ...

+1
source

Consider creating a minimal lexer . When in a state word he will remain in it until he sees letters and numbers. It will switch to the state separator when it encounters something else. Then he could do the exact opposite in the state separator.

Here is an example of a simple finite machine that might be useful. For brevity, it only works with numbers. echo "2341,452(42 555" | ./main will print each number on a separate line. This is not a lexer, but the idea of ​​switching between states is very similar.

 #include <stdio.h> #include <string.h> int main() { static const int WORD = 1, DELIM = 2, BUFLEN = 1024; int state = WORD, ptr = 0; char buffer[BUFLEN], *digits = "1234567890"; while ((c = getchar()) != EOF) { if (strchr(digits, c)) { if (WORD == state) { buffer[ptr++] = c; } else { buffer[0] = c; ptr = 1; } state = WORD; } else { if (WORD == state) { buffer[ptr] = '\0'; printf("%s\n", buffer); } state = DELIM; } } return 0; } 

If the number of states increases, you can consider replacing if that check the current state with switch blocks. Performance can be increased by replacing getchar reading an entire block of input into a temporary buffer and repeating through it.

If you need to deal with a more complex input file format, you can use lexical analyzer generators such as flex . They can perform the task of determining state transitions and other parts of lexer generation for you.

+3
source

A few points:

First of all, do not use feof(file) as a loop condition; feof will not return true until you try to read the end of the file, so your loop will run too often.

Secondly, you mentioned the following:

fscanf(file,"%[Az]",string);

It reads the first word in order, but the file pointer continues to rewind so that it reads the first word again and again.

This is not quite what is happening; if the next character in the stream does not match the format specifier, scanf returns without reading anything, and string not changed.

Here's a simple, if inelegant method: it reads one character at a time from the input file, checks to see if it is either an alpha or a digit, and if so, adds it to the line.

 #include <stdio.h> #include <ctype.h> int get_next_word(FILE *file, char *word, size_t wordSize) { size_t i = 0; int c; /** * Skip over any non-alphanumeric characters */ while ((c = fgetc(file)) != EOF && !isalnum(c)) ; // empty loop if (c != EOF) word[i++] = c; /** * Read up to the next non-alphanumeric character and * store it to word */ while ((c = fgetc(file)) != EOF && i < (wordSize - 1) && isalnum(c)) { word[i++] = c; } word[i] = 0; return c != EOF; } int main(void) { char word[SIZE]; // where SIZE is large enough to handle expected inputs FILE *file; ... while (get_next_word(file, word, sizeof word)) // do something with word ... } 
+2
source

What are your delimiters? The second argument to strtok should be a string containing your separators, and the first should be a pointer to your string for the first time, and then NULL after:

 char * p = strtok(line, ","); // assuming a , delimiter printf("%s\n", p); while(p) { p = strtok(NULL, ","); printf("%S\n", p); } 
0
source

Source: https://habr.com/ru/post/1386596/


All Articles