PYCANDIS

NAME
SYNOPSIS AND DESCRIPTION
PYTHON REPRESENTATION OF A CANDIS FILE
CANDIS OBJECT
CREATING A NEW CANDIS FILE
READING A CANDIS FILE
MASKED DATA
SEE ALSO
AUTHORS

NAME

Pycandis - read and write Candis files from Python

SYNOPSIS AND DESCRIPTION

from pycandis import *
from numpy import *

The Pycandis package allows Candis files to be read and written from Python programs. In Pycandis, a Candis object built from Candis and Field classes defines the representation of a Candis file inside Python. Candis and ordinary Python methods are used to create and access Candis files.

Conceptually, a Candis file consists of 4 parts: (1) a "comment" section, which contains arbitrary text describing typically what has been done to the file; (2) a "parameters" section containing scalar variable-value pairs; (3) a "static fields" section containing data fields of which only a single instance is needed; and (4) a "variable fields" section containing data fields for which a sequence of one or more instances are defined in the Candis format. These instances represent, for instance, the values of the variable fields at successive times or possibly successive levels in numerical model output. The meaning of multiple instances is left up to the user. The Python representation of a Candis file contains a single instance of the variable fields at any given time, and methods (described below) are available to scan sequentially through successive instances. This division into successive instances allows very large Candis files to be defined without having to read the entire file into (virtual) memory at one time.

A "field" contains a zero to four dimensional array of 32 bit floating point numbers as well as information describing the size and shape of the array. Also included is an ASCII name for the field as well as the names of the dimensions associated with the array axes. Each of the dimensions is associated with a one-dimensional static field called an "index field" which gives the values of the variable associated with that axis. There can be no more than 4 dimensions in a Candis file.

Parameters are generally used to document values used in the creation or modification of the file. There are two additional uses: (1) The "bad" and "badlim" parameters are positive floating point numbers. "badlim" specifies the maximum absolute value a field element can take on and still be considered "good" data. "bad" is the preferred value used to indicate bad or missing data elements, though any value greater than "badlim" will do. If missing, default values are 1e30 for bad and 0.999e30 for badlim. The values of the bad and badlim parameters apply globally to all fields. (2) Redundant information about the starting and step values for index fields is present for historical reasons and is represented as parameters. The normal end user doesn’t have to worry about these parameters, as Pycandis takes care of them automatically.

PYTHON REPRESENTATION OF A CANDIS FILE

Within Python, a Candis object represents the comments as a list of strings, the parameters as a dictionary of name-value pairs, and the static and variable fields as dictionaries of name-field pairs.

Each Field object contains field data in the form of a Numpy array in C format, i. e., the last dimension iterates most rapidly as one steps through memory. It also contains information about the field, most notably the size and shape of the array and the dimension names associated with each array axis. (There are a few other minor pieces of information present for historical reasons which are of no concern to the end user.)

CANDIS OBJECT

The starting point for all Candis file manipulations in Python is the creation of a Candis object. To create a new empty Candis object c within Python, execute the command c = Candis(). Once the object is created, various operations can be done on it, such as reading in a Candis file, writing the file, or constructing and modifying the Candis object by adding comments, parameters, or fields.

What follows assumes that c is a Candis object as created above.

CREATING A NEW CANDIS FILE

Comments can be added to the empty Candis object using the append method:
c.comments.append(’comment_string’)

Each append adds a comment line to the comment list. The comment string should not have a trailing newline.

Parameters may be added using standard Python dictionary methods:
c.params[’start_year’] = 2005

The index parameters used in standard Candis files are added automatically. If default values of the bad and badlim parameters are sufficient then these do not need to be added either. Otherwise, they can be added explicitly. For example:
c.params[’bad’] = 10000

and
c.params[’badlim’] = 9999

Static and variable fields can be added using the Field class. Starting with the index fields:

c.sfields[’x’] = Field([’x’], xfield)
c.sfields[’y’] = Field([’y’], yfield)

where xfield and yfield are one-dimensional Numpy arrays containing the axis grid values for the two dimensions ’x’ and ’y’. Two-dimensional fields, for example, the temperature defined over the x-y grid, can be defined as well:
c.vfields[’temperature’] = Field([’x’, ’y’], tempfield)

where tempfield is a Numpy array containing the temperature data. The first dimension of the Numpy array corresponds to the ’x’ dimension, the second to the ’y’ dimension. The size and shape of the temperature data array must be consistent with the corresponding index fields.

A scalar field, for example, the current year, can be defined as follows:
c.vfields[’current_year’] = Field([], year)

where year is a scalar numerical variable or constant.

Once all elements of the Candis object are defined, the new Candis file may be created:
c.write(’newfile.cdf’)

where the string argument of the write command is the desired file name. If the file is to be written to the standard output, then the file name argument should be omitted:
c.write()

If multiple variable slices are desired, the data in the variable slice fields can be updated for each slice and the write statement executed again:

c.vfields[’temperature’].data = newtempfield
c.vfields[’current_year’].data.flat = newyear
c.write(’newfile.cdf’)

In general, updating a field by assignment as illustrated above requires a Numpy array on the right side of the assignment statement. However, for a field with zero dimensions, the scalar value of that field may be assigned as in the above ’current_year’ case if the ’flat’ method is used as shown.

When all desired variable slices have been written, close the new file:
c.close()

If one intends to create a Candis file with only a single variable slice, creating the fields and assigning values to them in one step as was done above for the first variable slice is the most efficient way to do it. However, if multiple variable slices are intended, it may be more convenient from a programming point of view to create initially empty or zeroed variable fields and then fill them and write all variable slices, including the first, in a single loop. Creating zeroed fields can be done as follows:

c.vfields[’a’] = Field([’x’, ’y’], zeros((nx, ny)))
c.vfields[’slice’] = Field([], 0)

READING A CANDIS FILE

First create an empty Candis object:
c = Candis()

Then read in the Candis file:
gotit = c.read(’filename’)

where ’filename’ is the name of the Candis file. Omit the file name for input from the standard input:
gotit = c.read()

The read command returns True if at least one variable slice exists in the Candis file. (If it does’t, something is wrong with the file!) Otherwise, it returns False. This combination may be replaced by the convenience function
c = ReadCandis(’filename’)

if desired.

All elements of the file may now be accessed by standard Python methods. Recall that c.comments is a list of comments, c.params is a dictionary of parameters, and c.sfields and c.vfields are dictionaries of Field objects. Thus, for instance,
c.params.keys()

prints a list of parameter names,
baseyear = c.params[’base_year’]

returns the value of the ’base_year’ parameter,
xvals = c.sfields[’x’].data

returns a Numpy array containing the data for the ’x’ dimension, and
tempdims = c.vfields[’temperature’].dims

returns a list of the dimension names associated with the ’temperature’ field.

Successive variable slices may be accessed by repeating the above read command until it returns a value of False, at which point the end of the Candis file has been reached.

A check as to whether certain parameters, static fields, and variable fields exist in a Candis object may be performed by the check method:
c.check([’base_year’], [’x’, ’y’], [’temperature’])

fails if any of these components don’t exist in the file. The three lists represent respectively the desired parameters, static fields, and variable fields. Likewise,
c.validate()

checks to see if the Candis object is self-consistent, failing if it is not.

The convenience method getData returns the data in the form of a Numpy array associated with a given field whether it is a static or a variable field. For example:
tempdata = c.getData(’temperature’)

If fields of the same name occur in both static and variable slices (not recommended!), the static slice data are returned.

MASKED DATA

Candis represents bad or missing data by data values greater in magnitude then the value of the parameter badlim. Typically these are set to the value of the parameter bad.

The Numpy package in Python handles this situation differently by creating what is called a ‘‘masked array’’. A Numpy array with Candis bad data values may be converted into a masked array with the following commands:

badlim = c.params[’badlim’]
array_mask = candis_array > badlim
masked_array = ma.array(candis_array, mask = array_mask)

A convenience function exists with the same behavior as getData to convert a Candis field in the static or variable slice to a masked array:
masked_tempdata = c.getMData(’temperature’)

Array elements with absolute values exceeding the value of the badlim parameter are masked.

An unmasked array with masked elements assigned a specified fill value may be generated from the masked array as follows:
filled_array = ma.filled(masked_array, 1.e30)

The fill value specified here is 1.e30. This is more useful than simply recovering the data in the masked array, which may be accomplished by the data method, masked_array.data, since masked values in arrays created by combining arithmetically masked arrays are not easily predictable. Using the ma.filled function guarantees that masked array elements will have a known, specified value in the output array. This is important when inputing masked array data into a Candis file.

The mask (a Boolean array of the same size and shape as the original array with True values for masked elements) may be similarly recovered:
array_mask = masked_array.mask

AUTHORS

Max Brister wrote the pycandis package.

David Raymond wrote this man page.