HDF5 rowmajor or colmajor

Question

HDF5 rowmajor or colmajor

Can I find out if the matrix is stored in HDF5 format in RowMajor or ColMajor? For example, when I save matrices from an octave that store them inside, like ColMajor, I need to transpose them when I read them in my C code, where the matrices are stored in RowMajor, and vice versa.

+6

hdf5

remi Jun 09 '14 at 8:50

source share

2 answers

Yossarian · Answer 1 · 2014-06-11T09:36:51+0000

HDF5 stores data in strict order:

HDF5 uses C storage conventions, assuming that the last dimension indicated is the fastest changing dimension, and the dimension indicated in the first list is the slowest change.

from the HDF5 User Guide .

However , if you use the built-in Octave interface in HDF5 format, it will automatically transfer arrays for you. In general, the way the data is actually recorded in the HDF5 file should be completely opaque to the end user, and the interface should deal with differences in the ordering of arrays, etc.

Timothy brown · Answer 2 · 2014-06-13T15:37:47+0000

As @Yossarian noted. HDF5 always stores data as a string (convention C). Octave is the same as Fortran, and internally stores data as a column.

When writing a matrix from Octave, the HDF5 layer transposes for you, so it is always written as a string, no matter what language you use. This provides file portability.

There is a very good example in the HDF5 7.3.2.5 User Guide mentioned by @Yossarian. Here's an example of (almost) reproducing with Octave:

octave:1> A = [ 1:3; 4:6 ] A = 1 2 3 4 5 6 octave:2> save("-hdf5", "test.h5", "A") octave:3> quit ~$ h5dump test.h5 HDF5 "test.h5" { GROUP "/" { COMMENT "# Created by Octave 3.6.4, Fri Jun 13 08:36:16 2014 MDT < user@localhost >" GROUP "A" { ATTRIBUTE "OCTAVE_NEW_FORMAT" { DATATYPE H5T_STD_U8LE DATASPACE SCALAR DATA { (0): 1 } } DATASET "type" { DATATYPE H5T_STRING { STRSIZE 7; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "matrix" } } DATASET "value" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 3, 2 ) / ( 3, 2 ) } DATA { (0,0): 1, 4, (1,0): 2, 5, (2,0): 3, 6 } } } } }

Note that the HDF5 layer will transfer the matrix to make sure it is stored in the main format.

Then an example read in C:

 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <hdf5.h> #define FILE "test.h5" #define DS "A/value" int main(int argc, char **argv) { int i = 0; int j = 0; int n = 0; int x = 0; int rank = 0; hid_t file_id; hid_t space_id; hid_t dset_id; herr_t stat; hsize_t *dims = NULL; int *data = NULL; file_id = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT); dset_id = H5Dopen(file_id, DS, dset_id); space_id = H5Dget_space(dset_id); n = H5Sget_simple_extent_npoints(space_id); rank = H5Sget_simple_extent_ndims(space_id); dims = malloc(rank*sizeof(int)); stat = H5Sget_simple_extent_dims(space_id, dims, NULL); printf("rank: %d\t dimensions: ", rank); for (i = 0; i < rank; ++i) { if (i == 0) { printf("("); } printf("%llu", dims[i]); if (i == (rank -1)) { printf(")\n"); } else { printf(" x "); } } data = malloc(n*sizeof(int)); memset(data, 0, n*sizeof(int)); stat = H5Dread(dset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, data); printf("%s:\n", DS); for (i = 0; i < dims[0]; ++i) { printf(" [ "); for (j = 0; j < dims[1]; ++j) { x = i * dims[1] + j; printf("%d ", data[x]); } printf("]\n"); } stat = H5Sclose(space_id); stat = H5Dclose(dset_id); stat = H5Fclose(file_id); return(EXIT_SUCCESS); }

When the task is compiled and started:

 ~$ h5cc -o rmat rmat.c ~$ ./rmat rank: 2 dimensions: (3 x 2) A/value: [ 1 4 ] [ 2 5 ] [ 3 6 ]

This is great, as it means that the matrices stored in memory are optimized. What this means is that you must change the way you perform your calculations. For row-major you need to do pre-multiplication, while for column-column you have to do post-multiplication. Here is an example, I hope this is explained a little more clearly.

Does it help?

HDF5 rowmajor or colmajor

More articles: