Reading a text file of unknown size

I am trying to read an array of arrays in a text file of unknown size. This is what I still have.

#include<stdio.h> #include<string.h> int main() { FILE *ptr_file; char buf[1000]; char output[]; ptr_file =fopen("CodeSV.txt","r"); if (!ptr_file) return 1; while (fgets(buf,1000, ptr_file)!=NULL) strcat(output, buf); printf("%s",output); fclose(ptr_file); printf("%s",output); return 0; } 

But I do not know how to allocate a size for the output array when I read a file of unknown size. Also, when I add a size for output, say n = 1000, I get a segmentation error. I am a very inexperienced programmer, any guidance is appreciated :)

The text file itself is technically a .csv file, so the content is as follows: "0,0,0,1,0,1,0,1,1,0,1 ..."

+6
source share
7 answers

The standard way to do this is to use malloc to select an array of some size and start reading into it, and if you run out of array before you finish the characters (i.e., t reach EOF before filling the array), select a larger size for the array and use realloc to make it bigger.

Here's what the read and highlight cycle looks like. I decided to read character input at a time using getchar (and not a line at a time using fgets ).

 int c; int nch = 0; int size = 10; char *buf = malloc(size); if(buf == NULL) { fprintf(stderr, "out of memory\n"); exit(1); } while((c = getchar()) != EOF) { if(nch >= size-1) { /* time to make it bigger */ size += 10; buf = realloc(buf, size); if(buf == NULL) { fprintf(stderr, "out of memory\n"); exit(1); } } buf[nch++] = c; } buf[nch++] = '\0'; printf("\"%s\"", buf); 

Two notes about this code:

  • The numbers 10 for the initial size and increment are too small; in real code you would like to use something much larger.
  • It is easy to forget that there is room for the final "\ 0"; in this code I tried to do this with -1 in if(nch >= size) '
+2
source

I would refuse if I didnโ€™t add to the answers, probably one of the most standard ways of reading an unknown number of lines of unknown length from a text file. In C, you have two main methods for entering characters. (1) character-oriented input (i.e. getchar , getc , etc.); and (2) linearly oriented input (i.e. fgets , getline ).

From this combination of functions, the POSIX getline function by default will allocate sufficient space for reading a line of any length (up to the exhaustion of system memory). Also, when reading input lines , linearly oriented input is usually the right choice.

To read an unknown number of rows, a general approach is to allocate the expected number of pointers (in a-to-char array of pointers ), and then redistribute as needed if you finish more necessary. If you want to work with the complexities of overlaying pointers to a structure together in a linked list, this is fine, but itโ€™s much easier to process an array of strings. (a linked list is more suitable if you have a structure with several members rather than a single line)

The process is simple. (1) allocates memory for some initial number of pointers ( LMAX below at 255 ), and then, when each line is read, (2) allocates memory to store the string and copy the string to an array (using strdup below which both (a) allocate memory to store strings and (b) copy the string to a new block of memory, returning a pointer to its address) (you assign the pointer returned to your string array[x] as array[x] )

As with any dynamic memory allocation, you are responsible for tracking the allocated memory by keeping a pointer to the beginning of each allocated memory block (so you can free it later) and then freeing the memory when it is no longer needed. (Use valgrind or some similar memory check to confirm that you have no memory errors and freed up all the created memory)

The following is an example of an approach that simply reads any text file and prints its lines before stdout before freeing up the memory allocated for storing the file. After you read all the lines (or while you read all the lines), you can easily parse your csv input into separate values.

Note: below, when LMAX lines were read, the array redistributed to hold twice as much as before, and reading continues. (You can set LMAX to 1 if you want to allocate a new pointer for each row, but this is a very inefficient way to handle memory allocation). Choosing some reasonable expected starting value and then redistributing the 2X current is the standard method of redistributing, but you can allocate additional blocks of any size that you choose.

Take a look at the code and let me know if you have any questions.

 #include <stdio.h> #include <stdlib.h> #include <string.h> #define LMAX 255 int main (int argc, char **argv) { if (argc < 2 ) { fprintf (stderr, "error: insufficient input, usage: %s <filename>\n", argv[0]); return 1; } char **array = NULL; /* array of pointers to char */ char *ln = NULL; /* NULL forces getline to allocate */ size_t n = 0; /* buf size, 0 use getline default */ ssize_t nchr = 0; /* number of chars actually read */ size_t idx = 0; /* array index for number of lines */ size_t it = 0; /* general iterator variable */ size_t lmax = LMAX; /* current array pointer allocation */ FILE *fp = NULL; /* file pointer */ if (!(fp = fopen (argv[1], "r"))) { /* open file for reading */ fprintf (stderr, "error: file open failed '%s'.", argv[1]); return 1; } /* allocate LMAX pointers and set to NULL. Each of the 255 pointers will point to (hold the address of) the beginning of each string read from the file below. This will allow access to each string with array[x]. */ if (!(array = calloc (LMAX, sizeof *array))) { fprintf (stderr, "error: memory allocation failed."); return 1; } /* prototype - ssize_t getline (char **ln, size_t *n, FILE *fp) above we declared: char *ln and size_t n. Why don't they match? Simple, we will be passing the address of each to getline, so we simply precede the variable with the urinary '&' which forces an addition level of dereference making char* char** and size_t size_t *. Now the arguments match the prototype. */ while ((nchr = getline (&ln, &n, fp)) != -1) /* read line */ { while (nchr > 0 && (ln[nchr-1] == '\n' || ln[nchr-1] == '\r')) ln[--nchr] = 0; /* strip newline or carriage rtn */ /* allocate & copy ln to array - this will create a block of memory to hold each character in ln and copy the characters in ln to that memory address. The address will then be stored in array[idx]. (idx++ just increases idx by 1 so it is ready for the next address) There is a lot going on in that simple: array[idx++] = strdup (ln); */ array[idx++] = strdup (ln); if (idx == lmax) { /* if lmax lines reached, realloc */ char **tmp = realloc (array, lmax * 2 * sizeof *array); if (!tmp) return -1; array = tmp; lmax *= 2; } } if (fp) fclose (fp); /* close file */ if (ln) free (ln); /* free memory allocated to ln */ /* process/use lines in array as needed (simple print all lines example below) */ printf ("\nLines in file:\n\n"); /* print lines in file */ for (it = 0; it < idx; it++) printf (" array [%3zu] %s\n", it, array[it]); printf ("\n"); for (it = 0; it < idx; it++) /* free array memory */ free (array[it]); free (array); return 0; } 

Using / Exit

 $ ./bin/getline_rdfile dat/damages.txt Lines in file: array [ 0] Personal injury damage awards are unliquidated array [ 1] and are not capable of certain measurement; thus, the array [ 2] jury has broad discretion in assessing the amount of array [ 3] damages in a personal injury case. Yet, at the same array [ 4] time, a factual sufficiency review insures that the array [ 5] evidence supports the jury award; and, although array [ 6] difficult, the law requires appellate courts to conduct array [ 7] factual sufficiency reviews on damage awards in array [ 8] personal injury cases. Thus, while a jury has latitude in array [ 9] assessing intangible damages in personal injury cases, array [ 10] a jury damage award does not escape the scrutiny of array [ 11] appellate review. array [ 12] array [ 13] Because Texas law applies no physical manifestation array [ 14] rule to restrict wrongful death recoveries, a array [ 15] trial court in a death case is prudent when it chooses array [ 16] to submit the issues of mental anguish and loss of array [ 17] society and companionship. While there is a array [ 18] presumption of mental anguish for the wrongful death array [ 19] beneficiary, the Texas Supreme Court has not indicated array [ 20] that reviewing courts should presume that the mental array [ 21] anguish is sufficient to support a large award. Testimony array [ 22] that proves the beneficiary suffered severe mental array [ 23] anguish or severe grief should be a significant and array [ 24] sometimes determining factor in a factual sufficiency array [ 25] analysis of large non-pecuniary damage awards. 

Memory check

 $ valgrind ./bin/getline_rdfile dat/damages.txt ==14321== Memcheck, a memory error detector ==14321== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==14321== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==14321== Command: ./bin/getline_rdfile dat/damages.txt ==14321== Lines in file: array [ 0] Personal injury damage awards are unliquidated <snip> ... array [ 25] analysis of large non-pecuniary damage awards. ==14321== ==14321== HEAP SUMMARY: ==14321== in use at exit: 0 bytes in 0 blocks ==14321== total heap usage: 29 allocs, 29 frees, 3,997 bytes allocated ==14321== ==14321== All heap blocks were freed -- no leaks are possible ==14321== ==14321== For counts of detected and suppressed errors, rerun with: -v ==14321== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2) 
+2
source
 int main(int argc, char** argv) { FILE* fpInputFile = NULL; unsigned long ulSize = 0; // Input File size unsigned long ulIteration = 0; unsigned char* ucBuffer; // Buffer data if(argc != 2) { printf("Enter ihe file name \n"); return -1; } fpInputFile = fopen(argv[1],"r"); // file open if(!fpInputFile){ fprintf(stderr,"File opening failed"); } fseek(fpInputFile,0,SEEK_END); ulSize = ftell(fpInputFile); //current file position fseek(fpInputFile,0,SEEK_SET); ucBuffer = (unsigned char*)malloc(ulSize); // memory allocation for ucBuffer var fread(ucBuffer,1,ulSize,fpInputFile); // Read file fclose(fpInputFile); // close the file } 

Use fseek and ftell to get text file offset

+1
source

I wrote the following code to read a file of unknown size and listed each character in a buffer (works fine for me). Read the following links to get a good idea of โ€‹โ€‹working with files:

Try something like this:

 char* buffer; size_t result; long lSize; pFile = fopen("CodeSV.txt","r"); if (pFile==NULL) {fputs ("File error",stderr); exit (1);} // obtain file size: fseek (pFile , 0 , SEEK_END); lSize = ftell (pFile); rewind (pFile); buffer = malloc(lSize); // copy the file into the buffer: result = fread (buffer,1,lSize,pFile); if (result != lSize) {fputs ("Reading error 2",stderr); exit (3);} /* the whole file is now loaded in the memory buffer. */ fclose (pFile); 
0
source

If the file you are reading is small, you can try the following:

 #include<stdio.h> #include<string.h> int main() { FILE *ptr_file; char output[10000]; ptr_file =fopen("lol_temp.txt","r"); if (!ptr_file) return 1; int bytes_read = fread(output,1,10000,ptr_file); fclose(ptr_file); printf("%s",output); return 0; } 
0
source

This is better done using a dynamic linked list than an array. Here I have a simple list that stores all char that you read from a file. since you said: "Ultimately, I want to read the file in a line and manipulate the line and output this modified line as a new text file." Finally, I created a file line. I tested it, so I think it should work fine :) You can split the interface and list implementation to split the file or even use obj. implementation file

 #include <stdio.h> #include <stdlib.h> typedef char Titem; //just to identify it // Interface of list typedef struct node *Tpointer; typedef struct node { Titem item; Tpointer next; } Tnode; typedef Tpointer Tlist; void initialize_list(Tlist *list); void insert_to_list_end(Tlist *list, Titem data); void cleanup_list(Tlist *list); // Implementation of list (only obj file is need in your application) void initialize_list(Tlist *list) { *list = NULL; } void insert_to_list_end(Tlist *list, Titem data) { Tpointer newnode, last = *list; newnode = (Tpointer)malloc(sizeof(Tnode)); newnode->item = data; newnode->next = NULL; if (last == NULL){ *list = newnode; }//first node else{ while (1) { if (last->next == NULL) { last->next = newnode; break; } last = last->next; } } } void cleanup_list(Tlist *list) { Tpointer aux1, aux2; aux1 = *list; while (aux1 != NULL) { aux2 = aux1->next; free(aux1); printf("\nDeleted"); //for testing purposes aux1 = aux2; } initialize_list(list); } #define file_dir "CodeSV.txt" int main(void){ FILE *fp; fp = fopen(file_dir, "r"); int counter = 1; Tlist list; if (fp) { initialize_list(&list); int c; while ((c = getc(fp)) != EOF){ insert_to_list_end(&list, (char)c); counter++; } fclose(fp); } else{ printf("file not found"); return 0; } //creating a string with what you read char stringFromFile[counter]; Tlist currentNode = list; int i; for (i = 0; i <= counter; i++) { stringFromFile[i] = currentNode->item; currentNode = currentNode->next; if (currentNode == NULL) { break; } } printf("WHAT YOU JUST READ: %s", stringFromFile); /*here you can manipulate the string as you wish. But remember to free the linked list (call cleanup_list) when u're done*/ cleanup_list(&list); return 1; } 
0
source

If the OP wants to do text processing and manipulate lines, instead of reading the entire file in 1 line, create a linked list of lines.

 #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #define LINE_MAXSIZE 65536 typedef struct line_T { struct line_T *next; char *data; size_t length; } line_T; line_T *ReadFile(FILE *istream) { line_T head; line_T *p = &head; char *buf = malloc(LINE_MAXSIZE); assert(buf); while (fgets(buf, LINE_MAXSIZE, istream)) { p->next = malloc(sizeof *(p->next)); assert(p->next); p = p->next; p->next = NULL; p->length = strlen(buf); assert(p->length < LINE_MAXSIZE - 1); // TBD: cope with long lines p->data = malloc(p->length + 1); assert(p->data); memcpy(p->data, buf, p->length + 1); } free(buf); return head.next; } unsigned long long CountConsumeData(line_T *p) { unsigned long long sum = 0; while (p) { sum += p->length; free(p->data); line_T *next = p->next; free(p); p = next; } return sum; } int main(void) { const char *fname = "CodeSV.txt"; FILE *istream = fopen(fname, "r"); line_T *p = ReadFile(istream); fclose(istream); printf("Length : %llu\n", CountConsumeData(p)); return 0; } 
0
source

Source: https://habr.com/ru/post/989760/


All Articles