C is written in the middle of the binary without overwriting any existing content

The problem today is that I need to write an array of numbers in a binary file in the original position. I have a position at which it should start, and I don’t want to overwrite the values ​​after that, I just want to insert the array into the original position in the file. For instance:

12345 

Let press 456 at position 2:

 12456345 

I know that I probably will have to implement it myself, but I want to know what your opinion is about how to implement this as efficiently as possible.

+6
source share
4 answers

Here is the extend_file_and_insert() function, which does the job more or less.

 #include <sys/stat.h> #include <unistd.h> enum { BUFFERSIZE = 64 * 1024 }; #define MIN(x, y) (((x) < (y)) ? (x) : (y)) /* off_t is signed ssize_t is signed size_t is unsigned off_t for lseek() offset and return size_t for read()/write() length ssize_t for read()/write() return off_t for st_size */ static int extend_file_and_insert(int fd, off_t offset, char const *insert, size_t inslen) { char buffer[BUFFERSIZE]; struct stat sb; int rc = -1; if (fstat(fd, &sb) == 0) { if (sb.st_size > offset) { /* Move data after offset up by inslen bytes */ size_t bytes_to_move = sb.st_size - offset; off_t read_end_offset = sb.st_size; while (bytes_to_move != 0) { ssize_t bytes_this_time = MIN(BUFFERSIZE, bytes_to_move); ssize_t rd_off = read_end_offset - bytes_this_time; ssize_t wr_off = rd_off + inslen; lseek(fd, rd_off, SEEK_SET); if (read(fd, buffer, bytes_this_time) != bytes_this_time) return -1; lseek(fd, wr_off, SEEK_SET); if (write(fd, buffer, bytes_this_time) != bytes_this_time) return -1; bytes_to_move -= bytes_this_time; read_end_offset -= bytes_this_time; /* Added 2013-07-19 */ } } lseek(fd, offset, SEEK_SET); write(fd, insert, inslen); rc = 0; } return rc; } 

(Please note that the added date 2013-07-19 was added, it was an error that appears only when the buffer size is less than the amount of data that needs to be copied to the file. Thanks to malat to indicate the error. Code has now been verified using BUFFERSIZE = 4 )

This is a small test code:

 #include <fcntl.h> #include <string.h> static const char base_data[] = "12345"; typedef struct Data { off_t posn; const char *data; } Data; static const Data insert[] = { { 2, "456" }, { 4, "XxxxxxX" }, { 12, "ZzzzzzzzzzzzzzzzzzzzzzzzX" }, { 22, "YyyyyyyyyyyyyyyY" }, }; enum { NUM_INSERT = sizeof(insert) / sizeof(insert[0]) }; int main(void) { int fd = open("test.dat", O_RDWR | O_TRUNC | O_CREAT, 0644); if (fd > 0) { ssize_t base_len = sizeof(base_data) - 1; if (write(fd, base_data, base_len) == base_len) { for (int i = 0; i < NUM_INSERT; i++) { off_t length = strlen(insert[i].data); if (extend_file_and_insert(fd, insert[i].posn, insert[i].data, length) != 0) break; lseek(fd, 0, SEEK_SET); char buffer[BUFFERSIZE]; ssize_t nbytes; while ((nbytes = read(fd, buffer, sizeof(buffer))) > 0) write(1, buffer, nbytes); write(1, "\n", 1); } } close(fd); } return(0); } 

It outputs the result:

 12456345 1245XxxxxxX6345 1245XxxxxxX6ZzzzzzzzzzzzzzzzzzzzzzzzZ345 1245XxxxxxX6ZzzzzzzzzzYyyyyyyyyyyyyyyYzzzzzzzzzzzzzzZ345 

It should be tested on some larger files (larger than BUFFERSIZE, but it would be wise to test with BUFFERSIZE much less than 64 KiB, I used 32 bytes, and that seemed OK). I just looked at the results, but the templates are designed to make it easy to see that they are correct. The code does not check for any calls to lseek() ; that is a minor risk.

+9
source

First use ftruncate() to enlarge the file to its final size. Then copy everything from the old end to the new one, and return to the insertion point. Then rewrite the middle content with the data you want to insert. This is as effective as it seems, because file systems usually do not offer true β€œinsertion” in the middle of files.

+5
source

I will broadly interpret your question as "how can I effectively implement the persistent storage of an object that supports random access search by index and insert with extension". As already noted, you can use a simple linear array in the file, but this will only be useful for searching (O (1)) and quite inefficient for inserting (O (n)). You could achieve O (log n) to search and insert using the tree data structure instead. Maintain one file that acts as an index, and another that acts as a data store and is a series of pieces. Each piece can be partially filled. The index file contains a tree (binary tree or B-tree), where each node corresponds to some adjacent fragment of the array and contains the size of this fragment (so the root of the node contains the size of the entire array). For the binary tree, the left and right child nodes contain the size of the left and right half (approximately) of the array. Finally, the leaf nodes contain a pointer to a piece in the data warehouse file that contains the actual data. The insert now includes a change in the 'size' property of the nodes 'k', where 'k' is the height of the tree. When a piece of the data warehouse becomes too full, split it (select a new one, enlarge the file, or if you also support deletion, possibly from the free list of empty blocks) and rebalance the tree (there are many standard ways to do this.)

Is it hard to sound? Definitely! Effectively inserting a medium file is more difficult to achieve than adding.

0
source

I agree with the others, but let me formulate the solution in a slightly different way:

  • Get the name of the temp file (there are special calls for this)

  • Copy the source file to the temp file (now there are two copies of the same file)

  • Open the source file for "append".

  • Truncate it to the insertion point

  • Write your new details

  • Open your temporary file for "reading"

  • "Search" to the entry point (again, the call depends on the OS)

  • Reading to the end of the file in a temporary file; pasting into your source file (still open to "append").

  • Close both files

  • Delete temporary file

0
source

Source: https://habr.com/ru/post/915002/


All Articles