The best approach for reading and writing large files with collective MPI-IO

Question

The best approach for reading and writing large files with collective MPI-IO

I would like to read and write large datasets in Fortran using MPI-IO. My preferred approach was to use the MPI type defined with MPI_type_create_subarray, with one dimension, to describe the representation of each process in a file. My Fortran code is as follows:

! A contiguous type to describe the vector per element. ! MPI_TYPE_CONTIGUOUS(COUNT, OLDTYPE, NEWTYPE, IERROR) call MPI_Type_contiguous(nComponents, rk_mpi, & & me%vectype, iError) call MPI_Type_commit( me%vectype, iError ) ! A subarray to describe the view of this process on the file. ! MPI_TYPE_CREATE_SUBARRAY(ndims, array_of_sizes, array_of_subsizes, ! array_of_starts, order, oldtype, newtype, ierror) call MPI_Type_create_subarray( 1, [ globElems ], [ locElems ], & & [ elemOff ], MPI_ORDER_FORTRAN, & & me%vectype, me%ftype, iError)

However, array_of_sizes and array_of_starts, which describe global values, are simply "normal" integers in the MPI interface. Thus, with this approach, there is a limit of approximately 2 billion elements. Is there any other interface that uses MPI_OFFSET_KIND for these global values? The only way around this, I see so far, is using the move parameter in MPI_File_set_view instead of defining the view using the MPI subarray type. However, it "feels" wrong. Do you expect performance impact in any collective I / O approach? Does anyone know if this interface will change in MPI-3? Maybe I should use some other type of MPI?

What is the recommended solution here for writing large data files with collective IO efficiently parallel to the disk?

+4

io fortran mpi

haraldkl Feb 18 '12 at 22:11

source share

1 answer

Jonathan dursi · Answer 1 · 2012-02-19T14:44:23+0000

Help comes.

MPI-3 will perform data type manipulation procedures that use MPI_Count instead of int. For backward compatibility (groan), existing routines will not change, but you should be able to make your type.

But at the moment .. In particular, for subarray, this is usually not considered a huge problem at the moment - even for a 2d array, the 2 billion indexes give you a 4x10 ¹⁸ array size which, admittedly, is quite large (but definitely of this kind the numbers that exascale calculations are aimed at). In higher dimensions, it is even larger.

However, in 1d, the list of 2 billion numbers in length is only ~ 8 GB, which is not related to big data, and I think that the situation you are in. My suggestion is to leave it in the form you have it now, as long as you can. Is there a common factor in local elements? You can get around this by combining types in units of (say) 10 vectypes, if that works - it doesn't matter to your code, but it will reduce the numbers in locElements and globElements by the same factor. Otherwise, yes, you can always use the move field in the file set view.

The best approach for reading and writing large files with collective MPI-IO

More articles: