Improving Fortran formatted I / O with lots of small files

Suppose I have the following requirements for writing monitor files from a simulation:

  • You need to write a large number of individual files, usually in the amount of 10,000
  • Files must be human readable, i.e. formatted I / O
  • Periodically, a new line is added to each file. Usually every 50 seconds.
  • New data should be available almost instantly, so large manual write buffers are not parameters
  • We are in the Luster file system, which seems to be optimized for the very opposite: sequential writing to a small number of large files.

I did not formulate the requirements myself, so, unfortunately, it makes no sense to discuss them. I just wanted to find the best possible solution with the above assumptions. I came up with a small working example to test several implementations. Here is the best I could do so far:

!===============================================================!
! program to test some I/O implementations for many small files !
!===============================================================!
PROGRAM iotest

    use types
    use omp_lib

    implicit none

    INTEGER(I4B), PARAMETER :: steps = 1000
    INTEGER(I4B), PARAMETER :: monitors = 1000
    INTEGER(I4B), PARAMETER :: cachesize = 10

    INTEGER(I8B) :: counti, countf, count_rate, counti_global, countf_global
    REAL(DP) :: telapsed, telapsed_global
    REAL(DP), DIMENSION(:,:), ALLOCATABLE :: density, pressure, vel_x, vel_y, vel_z
    INTEGER(I4B) :: n, t, unitnumber, c, i, thread
    CHARACTER(LEN=100) :: dummy_char, number
    REAL(DP), DIMENSION(:,:,:), ALLOCATABLE :: writecache_real

    call system_clock(counti_global,count_rate)

    ! allocate cache
    allocate(writecache_real(5,cachesize,monitors))
    writecache_real = 0.0_dp

    ! fill values
    allocate(density(steps,monitors), pressure(steps,monitors), vel_x(steps,monitors), vel_y(steps,monitors), vel_z(steps,monitors))
    do n=1, monitors
        do t=1, steps
            call random_number(density(t,n))
            call random_number(pressure(t,n))
            call random_number(vel_x(t,n))
            call random_number(vel_y(t,n))
            call random_number(vel_z(t,n))
        end do
    end do

    ! create files
    do n=1, monitors
        write(number,'(I0.8)') n
        dummy_char = 'monitor_' // trim(adjustl(number)) // '.dat'
        open(unit=20, file=trim(adjustl(dummy_char)), status='replace', action='write')
        close(20)
    end do

    call system_clock(counti)

    ! write data
    c = 0
    do t=1, steps
        c = c + 1
        do n=1, monitors
            writecache_real(1,c,n) = density(t,n)
            writecache_real(2,c,n) = pressure(t,n)
            writecache_real(3,c,n) = vel_x(t,n)
            writecache_real(4,c,n) = vel_y(t,n)
            writecache_real(5,c,n) = vel_z(t,n)
        end do
        if(c .EQ. cachesize .OR. t .EQ. steps) then
            !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(n,number,dummy_char,unitnumber, thread)
            thread = OMP_get_thread_num()
            unitnumber = thread + 20
            !$OMP DO
            do n=1, monitors
                write(number,'(I0.8)') n
                dummy_char = 'monitor_' // trim(adjustl(number)) // '.dat'
                open(unit=unitnumber, file=trim(adjustl(dummy_char)), status='old', action='write', position='append', buffered='yes')
                write(unitnumber,'(5ES25.15)') writecache_real(:,1:c,n)
                close(unitnumber)
            end do
            !$OMP END DO
            !$OMP END PARALLEL
            c = 0
        end if
    end do

    call system_clock(countf)
    call system_clock(countf_global)
    telapsed=real(countf-counti,kind=dp)/real(count_rate,kind=dp)
    telapsed_global=real(countf_global-counti_global,kind=dp)/real(count_rate,kind=dp)
    write(*,*)
    write(*,'(A,F15.6,A)') ' elapsed wall time for I/O: ', telapsed, ' seconds'
    write(*,'(A,F15.6,A)') ' global elapsed wall time:  ', telapsed_global, ' seconds'
    write(*,*)

END PROGRAM iotest

Key features: OpenMP parallelization and manual write buffer. Here are some of the timings in the Luster file system with 16 threads:

  • cachesize = 5: elapsed wall time for I / O: 991.627404 seconds
  • cachesize = 10: elapsed wall time for I / O: 415.456265 seconds
  • cachesize = 20: elapsed wall time for I / O: 93.842964 seconds
  • cachesize = 50: elapsed wall time for I / O: 79.859099 seconds
  • cachesize = 100: elapsed wall time for I / O: 23.937832 seconds
  • cachesize = 1000: elapsed wall time for I / O: 10.472421 seconds

, 16 :

  • cachesize = 1: -: 5.543722
  • cachesize = 2: -: 2.791811
  • cachesize = 3: -: 1.752962
  • cachesize = 4: -: 1.630385
  • cachesize = 5: -: 1,174099
  • cachesize = 10: -: 0.700624
  • cachesize = 20: -: 0.433936
  • cachesize = 50: -: 0.425782
  • cachesize = 100: -: 0,227552

, - Luster , , - . , , . , . , , , 1024-4096 root. , , .

- ?

1 , . 1, ( ), 64k ( 1M). - . , , .

+4

Source: https://habr.com/ru/post/1663865/


All Articles