How to Reliably Write Slow and Unreliable NFS

I am not an expert in C, and I am looking for some tips to make my program more reliable and reliable. Just to give some context: I wrote a program to do some scientific calculations, which take quite a lot of time (about 20 hours), which I do in a large university cluster of HPC linux, using the SLRUM planning system and mounted NFS file systems. It seems that for some time within 20 hours, the connection to the file system becomes obsolete (on the whole machine, regardless of my program), and the first attempt to open and write the file takes a very long time, and this leads to segfault with the error that I had before still could not track accurately. Below is the minimum file that at least conceptually reproduces the error: the program starts,The file opens and everything works. The program performs some lengthy calculation (simulates sleep ()), tries to open and write to the same file again, and it fails. What are some conventions to make my code more reliable and reliably write my results to a file without crashes?

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv) {
    // Declare variables
    FILE *outfile;
    char outname[150] = "result.csv";

    // Open file for writing
    printf("CHECKING if output file '%s' is writable?", outname);
    outfile=fopen(outname, "w");
    if (outfile == NULL) {
        perror("Failed: ");
        exit(EXIT_FAILURE);
    }
    fclose(outfile);
    printf(" PASSED.\n");

    // Do some computation that takes really long (around 19h)
    sleep(3);

    // Open file again and Write results
    printf("Writing results to %s ...", outname);
    outfile=fopen(outname, "w");
    if (outfile == NULL) {
        perror("Failed writing in tabulate_vector_new: ");
        exit(EXIT_FAILURE);
    }
    fprintf( outfile, "This is the important result.\n");
    fclose(outfile);

    printf(" DONE.\n");
    return 0;
}
+4
1

, - NFS. , , . , , , NFS mount. , . :

pid_t pid = fork();

if (pid == -1)
{
    // error, failed to fork(). should probably give up now. something is really wrong.
} 
else if (pid > 0)
{
    // if the child exits, it has successfully interacted with the NFS file system
    wait(NULL);
    // proceed with attempting to write important data
}
else 
{
    // we are the child; fork df in order to test the NFS file system
    execlp("df", "df", "/mnt", (char *)NULL)
    // the child has been replaced by df, which will try to statfs(2) /mnt for us
}

, df, , NFS (, , /mnt). , df , , , . , df , , alarm(2), , , , df. , , zombie df .

, NFS, , , .

0
source

Source: https://habr.com/ru/post/1684063/


All Articles