starting with hdf5 — writing a data cube in C

pre(r)amble

Let us start using hdf5 to write out a “data cube” — in our example a series of scalar values f(x, y, z).

hdf5 is a library written in C but it can be called from many programming languages.  We will start by creating and writing a data cube from a C program (don’t worry, I will also show you a python version in another post), and then we will discuss various ways of examining this data cube. Edit a file, for example make-sample-cube.c, and start with:

the code

First include these .h files:

#include <hdf5.h>
#include <hdf5_hl.h>

This gives you access to the API. We will also need some other include files

We will need some more include files:

#include <stdlib.h>
#include <math.h>
#include <assert.h>

Let’s go top-down and write a main() routine that shows what we want to do in this program. We will have a fill_cube() routine which creates our data cube from mathematical functions, and a write_cube() routine which saves them to disk as hdf5 files. Since we’re taking a top-down approach, we declare their prototypes before the main() function:

double *fill_cube(int nx, int ny, int nz);
void write_cube(double *data, int nx, int ny, int nz, char *fname);

then the actual main():

int main(int argc, char *argv[])
{
  double *data_cube;
  int nx, ny, nz;
  nx = 64;
  ny = 57;
  nz = 50;
  char *fname = "sample_cube.h5";
  data_cube = fill_cube(nx, ny, nz);
  write_cube(data_cube, nx, ny, nz, fname);
  free(data_cube);
  return 0;
}

Now let’s write the functions we use:

/** 
 * Fills up a cube with artificial data so that the various slices
 * will be interesting to examine and visualize.  Note that this
 * routine uses malloc() to allocate memory for the cube, so the
 * caller has to use free() when it has finished using this cube.
 * 
 * @param nx 
 * @param ny 
 * @param nz 
 * 
 * @return the data cube, with space freshly allocated by malloc()
 */
double *fill_cube(int nx, int ny, int nz)
{
  int i, j, k;
  double x, y, z;
  double *data;
  data = (double *) malloc(nx*ny*nz*sizeof(*data));
  for (i = 0; i < nx; ++i) {
    for (j = 0; j < ny; ++j) {
      for (k = 0; k < nz; ++k) {
        x = (i - nx/2)/100.0;
        y = (j - ny/2)/100.0;
        z = (k - nz/2)/100.0;
        double val = (exp((-(z)*(z)/1.0
                           -(y)*(y)/1.0)
                          * (i+1)*(i+1)/10.0)
                      * (x+1)*(x+1));
        data[i*ny*nz + j*nz + k] = val;
      }
    }
  }
  return data;
}

Finally the write_cube() routine:

/** 
 * Writes a simple data cube out to disk as an hdf5 file
 * 
 * @param data 
 * @param nx 
 * @param ny 
 * @param nz 
 * @param fname 
 */
void write_cube(double *data, int nx, int ny, int nz, char *fname)
{
  /* now the various steps involved in preparing an hdf5 file */
  hid_t file_id;
  /* at this time our basic file just has the values strung out,
     so the hdf5 rank is 1 */
  hsize_t dims[3] = {nx, ny, nz};
  herr_t status;

  /* create a HDF5 file */
  file_id = H5Fcreate(fname, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
  /* create and write a double type dataset named "/datacube" */
  status = H5LTmake_dataset(file_id, "/datacube", 3, dims,
                            H5T_NATIVE_DOUBLE, data);
  assert(status != -1);
  /* add some hdf5 attributes with the metadata we need */
  H5LTset_attribute_int(file_id, "/datacube", "nx", &nx, 1);
  H5LTset_attribute_int(file_id, "/datacube", "nx", &nx, 1);
  H5LTset_attribute_int(file_id, "/datacube", "ny", &ny, 1);
  H5LTset_attribute_string(file_id, "/datacube", "x_units", "m");
  H5LTset_attribute_string(file_id, "/datacube", "y_units", "kg");
  H5LTset_attribute_string(file_id, "/datacube", "z_units", "degK");

  status = H5Fclose (file_id); assert(status != -1);
  assert(status != -1);
}

compiling and running

You will need the hdf5 library, which you can get on an Ubuntu or Debian system with:

sudo apt-get -u install libhdf5-serial-dev

You can compile this program with:

gcc -o make-sample-cube make-sample-cube.c -lm -lhdf5 -lhdf5_hl

You can run this program with:

./make-sample-cube

This creates a file called sample_cube.h5, and in a future post I will discuss various ways of looking at this data cube: slicing it and visualizing it.

discussion 1

Now I should find those honeyed words which allow you to wrap your head around the use of hdf5, understanding the broad strokes and the nuances at the same time. This is partly possible for this program, but you will soon learn that hdf5 goes into some brutal (and powerful) nitty gritty, at which point simple explanations miss the point entirely.

As usual with any library we #include the hdf5 .h files. Then we build a cube of data, and it’s worth mentioning in passing how we store the cube data in a C array:

C is clumsy for storing multidimensional arrays: you either hope you’re lucky and you have fixed size arrays (this happens when you’re first learning C, but not after). Otherwise you have to represent your multidimensional array using a C one-dimensional array, or do many nested memory allocations so that you can still use the syntax a[i][j][k].

After experimenting with various approaches people usually end up not using multiple memory allocations, but rather using a single one-dimensional array and index arithmetic to access the data in there. The routine fill_cube(), for example, uses this approach: what you might ideally represent as:

data[i][j][k] = val;

is instead represented as:

data[i*ny*nz + j*nz + k] = val;

this is called “row-major ordering” (where the last index is the one that varies most rapidly) and it is the convention used in C and many other languages (with the exceptions of FORTRAN and Matlab, which use column-major ordering). You probably don’t need to learn too much more, but you can read the wikipedia article on row-major ordering.

how the hdf5 writing is done

With that out of the way, let us see how we write the hdf5 file. hdf5 allows very complex and diverse ways of saving data. For example, you can have multiple datasets (these are almost always arrays of numbers) with all sorts of attributes (this is almost always metadata).

In this example we have a single dataset (our cube) and we start with very simple metadata/attributes (just the sizes and the units, which we have playfully set to be meters, kilograms and degrees Kelvin). This allows us to use the “hdf5 lite (H5LT)

The approach you can see in the code is that you take the following steps:

  1. Create an hdf5 file (it starts out empty).
  2. Make a dataset — this turns your array of data into an hdf5 “dataset”.
  3. Set some attributes (i.e. metadata).
  4. Close the file.

Note that there was no write() type of call: whatever you do with the data sets associated with an open file are guaranteed to be written out when you close the file.

discussion 2 with mild rant

If this were all you ever do with hdf5 I would say that the C interface to writing hdf5 files is a very good example of a C API. Unfortunately as soon as we go beyond this simple type of example (for example to read slices of data in efficient ways) the programming gets very detailed and ugly, and in my opinion this is one of the current design problems with hdf5: there is a nice API (the H5LT layer) for very simple cases (which you will very soon outgrow), and a powerful (but not nice) API for the full-blown library, but there is no intermediate level API for what most programmers will need most of the time.

Advertisements

About markgalassi

Mark Galassi is a research scientist in Los Alamos National Laboratory, working on astrophysics and nuclear non-proliferation.
This entry was posted in data, rant, try this and tagged , , , . Bookmark the permalink.

One Response to starting with hdf5 — writing a data cube in C

  1. Pingback: file formats for scientific data | programmingforresearch

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s