cdf −− Candis or common data format file
#include <cdfhdr.h>
Candis files are files used to store arrays of floating point numbers from numerical models and observational projects. The idea is to provide a common format for many types of data so that diverse projects can share common utilities, thus minimizing the programming effort needed to undertake new projects.
Candis files start with a header in ASCII form that describes the contents of the file. The rest of the file is made up of a succession of slices that contain data. The first slice is special in that it contains data that don’t change, e. g., calibrations, ambient profiles, etc. The remaining slices all contain data in the same format. The idea is to present the same fields at successive times.
In the UNIX tradition, Candis files are simple byte streams as far as the host computer is concerned. No dependence is put on the physical block structure provided by some device such as magnetic tape, or the logical record structure of a particular operating system. Thus, files can be moved around at will without loosing crucial information.
Three different forms of data representation in the slices are allowed, namely floating, packed integer, and ASCII. The header specifies which is being used. The floating point form is most useful for storing intermediate results. However, different computers often have different ways of representing floating point numbers, which means that it cannot be generally used for transferring files between different computer systems. The packed integer form scales floating point data and stores it in an integer. The precision of the integer is either "c" for "char", (typically 1 byte), "s" for "short", (typically 2 bytes), or "l" for "long", (typically 4 bytes). It compresses data by typically a factor of two compared to the floating point form, and is thus useful for long term storage. It also may be used for transferring data between computers, subject to some cautions about byte ordering. The ASCII form is available for immediate inspection using text editors, and is the best format for transfer between computers. However, this convenience is offset by a potentially large increase in file size over the floating point and packed integer forms. Conversion to and from ASCII form can also take significant computer time for large files.
An extension has been made to Candis whereby image data can be stored efficiently. A field is declared a pixel field by setting the above-mentioned precision to "p". This signals software to treat this field differently than ordinary float fields. In particular, no packing is done when float format is converted to int format, and conversion to ascii format is done as decimal integers, one integer per byte of data. The storage format is to pack pixels of 1, 2, 4, 8, 16, or 32 bits into 32 bit floating point words. An image consists of a two dimensional array of pixels. The number of pixels in the dimension that iterates most rapidly must fit evenly into 32 bit words. Things fall apart on machines with float word length != 32 bits. In addition, transport across architectures with different byte ordering of integers is tricky. For pixels of 8 bits or less, the ascii format is portable. For pixels of 16 or 32 bits, this is not so.
We now describe the structure of the header and of individual slices.
The header is made up of ASCII lines, each terminated by a newline. The header is split into five sections, namely, comments, parameters, static fields, variable fields, and format. Header lines are a maximum of 81 characters long, including the newline, and a header may have a maximum of 1000 lines.
The first line of the comment section should be a unique identifier of the data set. The contents of the rest of this section are optional, but should be used to provide any needed descriptive information.
The parameters section can be used to provide scalar parameters needed to interpret the data. Each line defines a single parameter in the form parameter_name parameter_value. The name and value are separated by white space (spaces or tabs). An optional comment may be appended, separated from the parameter value by white space, and beginning with a pound sign.
The static fields section describes fields occurring in the first slice. Each line contains information describing a single field. The format is a sequence of words or numbers separated by white space. An optional comment may be appended, separated from the field definition by white space, and beginning with a pound sign. The meanings of these entries are as follows:
1. fname: The name of the field
2. smul: The multiplicative scaling constant
3. sadd: The additive scaling constant
4. bytes: The precision of elements in packed integer format (c, s, l, or p)
4. dim: The number of dimensions of the field (0, 1, 2, 3, or 4)
5. dname1: Name of first dimension
6. dsize1: Array size of first dimension
7. dname2: ...
8. dsize2: ...
etc.
The packed integer format is obtained from the floating format by the equation I = F*smul + sadd where F is the floating point value and I is the integer value. A field of zero dimensions is a scalar, and is said to contain one element. A one dimensional field contains dsize1 elements, a two dimensional field dsize1*dsize2 elements, etc. Only as many entries of dimension names and sizes are needed as is specified by the entry dim.
For pixel fields (i. e., bytes = p), smul is the size in bits of each pixel and sadd is the number of pixels per field element. The product of smul and sadd must be exactly 32. The dimension size for the most rapidly varying dimension in a pixel field gives the number of field elements in that dimension (i. e., the number of 32 bit words) rather than the number of pixels. The number of pixels along this dimension is the dimension size times sadd.
The variable fields section describes fields occurring repeatedly in slices subsequent to the first slice. The format is the same as for static fields.
The format section consists of a single line containing one of the three words float, int, or ascii, with the obvious meaning.
Following the format section is a line containing a single asterisk in the first position. This is an end−of−header indicator.
Each slice contains an 8 (obsolete) or 16 byte decimal ASCII representation of a number called the element count, followed by the fields described in the appropriate header section. The element count contains the number of field elements in the slice, which is the sum of the elements in each field. If the slice contains no fields (as may be possible with the static field slice), the element count must still be present. (In this case it obviously is set to zero.) For the 16 byte case the first byte is the ascii character ’@’. The current software uses this to distinguish between old 8 byte element counts and new 16 byte element counts. Current software will read either format but will only create the 16 byte format. The increase in size of the word count was necessitated by the increase in data size that can be handled by current computers.
For the float format, the data representation depends upon the computer being used. The integer format represents elements as integers in binary form. Each element therefore takes up the number of bytes defined by C language specification for char, short, or long data types (depending on the precision specified), usually one, two, or four bytes. For the ASCII format, each element is represented by a string of ASCII digits with an optional decimal point and an optional exponent field. The following examples are all legal representations: 13, 1.3, −0.345e12. Elements are separated by white space (spaces, tabs, newlines, carriage returns) in the ASCII representation. No space is left between elements in the float and integer representations.
For fields with dimension greater than zero, representation is in the form of an array of appropriate dimensions. Storage order is the same as that used by the C language, i. e., the last dimension is iterated most quickly. Thus, v[3][2] would be presented in the order v[0][0], v[0][1], v[1][0], v[1][1],... in C language terminology. This field would have (in the header) fname = v, dim = 2, dsize1 = 3, and dsize2 = 2.
An example of a header is shown below. The exact format of the section headings must be used, i. e., the asterisks are important!
***comments***
header example 1
This is a test
of the header system.
***parameters***
dx 10 #units: meters
dz .1
dt 2 #units: seconds
date 3Aug86
***static_fields***
qs 1000 0 s 1 z 11 #saturation mixing ratio
***variable_fields***
time 1 0 l 0
u 1000 1000 s 2 x 21 z 11 #horizontal wind
w 1000 1000 s 2 x 21 z 11 #vertical wind
rs 1000 1000 s 1 x 21
***format***
float
*
Note that parameter values need not be numeric. The user’s program must decide how to handle parameter values.
CDF(3): Low level subroutines for handling Candis files;
CDF*(1): Various filters for analyzing Candis files.