Pycandis - read and write Candis files from Python
from
pycandis import *
from numpy import *
The Pycandis package allows Candis files to be read and written from Python programs. In Pycandis, a Candis object built from Candis and Field classes defines the representation of a Candis file inside Python. Candis and ordinary Python methods are used to create and access Candis files.
Conceptually, a Candis file consists of 4 parts: (1) a "comment" section, which contains arbitrary text describing typically what has been done to the file; (2) a "parameters" section containing scalar variable-value pairs; (3) a "static fields" section containing data fields of which only a single instance is needed; and (4) a "variable fields" section containing data fields for which a sequence of one or more instances are defined in the Candis format. These instances represent, for instance, the values of the variable fields at successive times or possibly successive levels in numerical model output. The meaning of multiple instances is left up to the user. The Python representation of a Candis file contains a single instance of the variable fields at any given time, and methods (described below) are available to scan sequentially through successive instances. This division into successive instances allows very large Candis files to be defined without having to read the entire file into (virtual) memory at one time.
A "field" contains a zero to four dimensional array of 32 bit floating point numbers as well as information describing the size and shape of the array. Also included is an ASCII name for the field as well as the names of the dimensions associated with the array axes. Each of the dimensions is associated with a one-dimensional static field called an "index field" which gives the values of the variable associated with that axis. There can be no more than 4 dimensions in a Candis file.
Parameters are generally used to document values used in the creation or modification of the file. There are two additional uses: (1) The "bad" and "badlim" parameters are positive floating point numbers. "badlim" specifies the maximum absolute value a field element can take on and still be considered "good" data. "bad" is the preferred value used to indicate bad or missing data elements, though any value greater than "badlim" will do. If missing, default values are 1e30 for bad and 0.999e30 for badlim. The values of the bad and badlim parameters apply globally to all fields. (2) Redundant information about the starting and step values for index fields is present for historical reasons and is represented as parameters. The normal end user doesn’t have to worry about these parameters, as Pycandis takes care of them automatically.
Within Python, a Candis object represents the comments as a list of strings, the parameters as a dictionary of name-value pairs, and the static and variable fields as dictionaries of name-field pairs.
Each Field object contains field data in the form of a Numpy array in C format, i. e., the last dimension iterates most rapidly as one steps through memory. It also contains information about the field, most notably the size and shape of the array and the dimension names associated with each array axis. (There are a few other minor pieces of information present for historical reasons which are of no concern to the end user.)
The starting point for all Candis file manipulations in Python is the creation of a Candis object. To create a new empty Candis object c within Python, execute the command c = Candis(). Once the object is created, various operations can be done on it, such as reading in a Candis file, writing the file, or constructing and modifying the Candis object by adding comments, parameters, or fields.
What follows assumes that c is a Candis object as created above.
Comments can be
added to the empty Candis object using the append method:
c.comments.append(’comment_string’)
Each append adds a comment line to the comment list. The comment string should not have a trailing newline.
Parameters may
be added using standard Python dictionary methods:
c.params[’start_year’] = 2005
The index
parameters used in standard Candis files are added
automatically. If default values of the bad and
badlim parameters are sufficient then these do not
need to be added either. Otherwise, they can be added
explicitly. For example:
c.params[’bad’] = 10000
and
c.params[’badlim’] = 9999
Static and variable fields can be added using the Field class. Starting with the index fields:
c.sfields[’x’]
= Field([’x’], xfield)
c.sfields[’y’] = Field([’y’],
yfield)
where
xfield and yfield are one-dimensional Numpy
arrays containing the axis grid values for the two
dimensions ’x’ and
’y’. Two-dimensional fields, for example,
the temperature defined over the x-y grid, can be defined as
well:
c.vfields[’temperature’] =
Field([’x’, ’y’], tempfield)
where tempfield is a Numpy array containing the temperature data. The first dimension of the Numpy array corresponds to the ’x’ dimension, the second to the ’y’ dimension. The size and shape of the temperature data array must be consistent with the corresponding index fields.
A scalar field,
for example, the current year, can be defined as follows:
c.vfields[’current_year’] = Field([],
year)
where year is a scalar numerical variable or constant.
Once all
elements of the Candis object are defined, the new Candis
file may be created:
c.write(’newfile.cdf’)
where the
string argument of the write command is the desired file
name. If the file is to be written to the standard output,
then the file name argument should be omitted:
c.write()
If multiple variable slices are desired, the data in the variable slice fields can be updated for each slice and the write statement executed again:
c.vfields[’temperature’].data
= newtempfield
c.vfields[’current_year’].data.flat = newyear
c.write(’newfile.cdf’)
In general, updating a field by assignment as illustrated above requires a Numpy array on the right side of the assignment statement. However, for a field with zero dimensions, the scalar value of that field may be assigned as in the above ’current_year’ case if the ’flat’ method is used as shown.
When all
desired variable slices have been written, close the new
file:
c.close()
If one intends to create a Candis file with only a single variable slice, creating the fields and assigning values to them in one step as was done above for the first variable slice is the most efficient way to do it. However, if multiple variable slices are intended, it may be more convenient from a programming point of view to create initially empty or zeroed variable fields and then fill them and write all variable slices, including the first, in a single loop. Creating zeroed fields can be done as follows:
c.vfields[’a’]
= Field([’x’, ’y’], zeros((nx, ny)))
c.vfields[’slice’] = Field([], 0)
First create an
empty Candis object:
c = Candis()
Then read in
the Candis file:
gotit = c.read(’filename’)
where
’filename’ is the name of the Candis
file. Omit the file name for input from the standard input:
gotit = c.read()
The read
command returns True if at least one variable slice
exists in the Candis file. (If it does’t, something is
wrong with the file!) Otherwise, it returns False.
This combination may be replaced by the convenience function
c = ReadCandis(’filename’)
if desired.
All elements of
the file may now be accessed by standard Python methods.
Recall that c.comments is a list of comments,
c.params is a dictionary of parameters, and
c.sfields and c.vfields are dictionaries of
Field objects. Thus, for instance,
c.params.keys()
prints a list
of parameter names,
baseyear = c.params[’base_year’]
returns the
value of the ’base_year’ parameter,
xvals = c.sfields[’x’].data
returns a Numpy
array containing the data for the ’x’
dimension, and
tempdims = c.vfields[’temperature’].dims
returns a list of the dimension names associated with the ’temperature’ field.
Successive variable slices may be accessed by repeating the above read command until it returns a value of False, at which point the end of the Candis file has been reached.
A check as to
whether certain parameters, static fields, and variable
fields exist in a Candis object may be performed by the
check method:
c.check([’base_year’], [’x’,
’y’], [’temperature’])
fails if any of
these components don’t exist in the file. The three
lists represent respectively the desired parameters, static
fields, and variable fields. Likewise,
c.validate()
checks to see if the Candis object is self-consistent, failing if it is not.
The convenience
method getData returns the data in the form of a
Numpy array associated with a given field whether it is a
static or a variable field. For example:
tempdata = c.getData(’temperature’)
If fields of the same name occur in both static and variable slices (not recommended!), the static slice data are returned.
Candis represents bad or missing data by data values greater in magnitude then the value of the parameter badlim. Typically these are set to the value of the parameter bad.
The Numpy package in Python handles this situation differently by creating what is called a ‘‘masked array’’. A Numpy array with Candis bad data values may be converted into a masked array with the following commands:
badlim =
c.params[’badlim’]
array_mask = candis_array > badlim
masked_array = ma.array(candis_array, mask =
array_mask)
A convenience
function exists with the same behavior as getData to
convert a Candis field in the static or variable slice to a
masked array:
masked_tempdata =
c.getMData(’temperature’)
Array elements with absolute values exceeding the value of the badlim parameter are masked.
An unmasked
array with masked elements assigned a specified fill value
may be generated from the masked array as follows:
filled_array = ma.filled(masked_array, 1.e30)
The fill value specified here is 1.e30. This is more useful than simply recovering the data in the masked array, which may be accomplished by the data method, masked_array.data, since masked values in arrays created by combining arithmetically masked arrays are not easily predictable. Using the ma.filled function guarantees that masked array elements will have a known, specified value in the output array. This is important when inputing masked array data into a Candis file.
The mask (a
Boolean array of the same size and shape as the original
array with True values for masked elements) may be similarly
recovered:
array_mask = masked_array.mask
cdf.5 presents the Candis file format.
cdf.3 documents the standard C language programming interface.
quickcandis.3 documents a simplified C interface for files with one variable slice.
Max Brister wrote the pycandis package.
David Raymond wrote this man page.