*NumPy* (numerical python) is a module which was created allow
efficient numerical calculations on multi-dimensional arrays of
numbers from within Python. It is derived from the merger of two
earlier modules named *Numeric* and *Numarray*. The actual
work is done by calls to routines written in the Fortran and C
languages.

NumPy defines a new data type called the *ndarray* or
n-dimensional array. (We refer to them here simply as
*arrays*. (Another Python module called *array*
defines one-dimensional arrays, so don’t confuse the arrays of NumPy
with this other type.) It also defines functions of ndarrays
(*ufuncs* or universal functions) which operate on each element
of the array. Replacements for the standard functions of the math
module exist. (Math module functions cannot be used directly on
ndarrays because they only accept scalar, not array arguments.) The
elements of ndarrays can be (among other things) integers, floats, and
complex numbers of various sizes. When the NumPy package is loaded,
ndarrays become as much a part of the Python language as standard
Python data types such as lists and dictionaries.

To load NumPy, import the NumPy module:

>>> from numpy import * >>>

This allows NumPy functions to be used without qualifying them with
the prefix `numpy`

. Alternatively, if NumPy names might conflict
with names from other modules such as the math module, then the
alternate import method

>>> import numpy >>>

may be used. In the first case the NumPy version of the sine function
would just be `sin`

, whereas in the second case it would be
`numpy.sin`

. For brevity we will use the first form in these
notes.

There are many ways to create arrays in NumPy. The simplest is to use
the *array* function to make a direct definition:

>>> from numpy import * >>> a = array([[1., 2., 3.], [4., 5., 6.]]) >>> a array([[ 1., 2., 3.], [ 4., 5., 6.]]) >>> a.shape (2, 3) >>>

The syntax of the argument of the array function looks like nested
lists of numbers with the level of nesting being equal to the
dimensionality of the array – 2 in the above case. The attribute
*shape* returns a tuple which gives the size of the array
along each dimension axis. Consistent with Python indexing, the
numbering of successive axes starts at 0, so the size along the
zero axis is 2 and the size along the 1 axis is 3.

The *arange* function creates a one-dimensional array consisting
of a sequence of numbers:

>>> b = arange(0,11) >>> b array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> c = arange(0.,2.,0.4) >>> c array([ 0. , 0.4, 0.8, 1.2, 1.6]) >>>

The first two arguments of arange are the start and stop values. Note
that the sequence stops *before* the stop value, as with the
standard Python range function. The third argument is the step
value in the sequence with a default value of 1.

Floating point arithmetic causes a potential pitfall with non-integer values of the step parameter. Look at the following example:

>>> arange(0.,2.1,0.3) array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1])

One would expect the last value in this array to be 1.8, but it is 2.1. However, with the miniscule inaccuracies in floating point arithmetic, the final value is actually a tiny bit less than 2.1, an inaccuracy that is not represented in the printed output. Thus, the rule that the sequence stops before the stop value is reached is technically obeyed, though the result is probably not what was expected. Be forwarned and don’t set your stop value to an element of the sequence if you are using floating point numbers! Make it one sequence element beyond the last desired element minus a small number, say, half the step size.

An unambiguous alternative to arange in such situations is the
function *linspace(start, stop, number)*. this function returns
an array with number evenly spaced elements with the specified
starting and stopping values:

>>> a = linspace(0.,3.5,8) >>> a array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5]) >>>

A couple of other methods for generating arrays are provided by the
*zeros* and *ones* functions. Observe:

>>> zeros((2,4)) array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) >>> ones((3,3,3)) array([[[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]], [[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]], [[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]]]) >>>

The tuple in the argument list of these functions defines the shape of the array. The arrays are filled with the values indicated by the function names.

One often deals with fields in an N-dimensional vector space as
approximated by arrays defined over an N-dimensional rectangular
grid. With each grid dimension one defines a one-dimensional array
containing “position” values along the corresponding axis. It is
useful to define N-dimensional arrays over the vector space, one for
each dimension, containing these position values. Such
*meshgrid* arrays can be used as inputs to construct fields which
are functions of position.

For the special case of two dimensions, NumPy provides a convenient
function for generating meshgrid arrays, called, appropriately,
*meshgrid*. Meshgrid returns the two desired meshgrid arrays in
a tuple:

>>> x = arange(0.,5.1) >>> y = arange(0.,3.1) >>> (X,Y) = meshgrid(x,y) >>> X array([[ 0., 1., 2., 3., 4., 5.], [ 0., 1., 2., 3., 4., 5.], [ 0., 1., 2., 3., 4., 5.], [ 0., 1., 2., 3., 4., 5.]]) >>> Y array([[ 0., 0., 0., 0., 0., 0.], [ 1., 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3., 3.]]) >>> a = X*X + Y*Y >>> a array([[ 0., 1., 4., 9., 16., 25.], [ 1., 2., 5., 10., 17., 26.], [ 4., 5., 8., 13., 20., 29.], [ 9., 10., 13., 18., 25., 34.]]) >>> b = x*x + y*y Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape >>>

The one-dimensional arrays `x`

and `y`

contain the position
values along the x and y axes, while the two-dimensional arrays
`X`

and `Y`

(which are distinct from `x`

and `y`

because Python variable names are case-sensitive) define these
position values at each point in the vector space. Note how the array
`a`

can be defined in terms of `X`

and `Y`

, but the
corresponding attempt to define `b`

in terms of `x`

and
`y`

fails.

Notice that variations in the first argument of meshgrid correspond to
variations in the second index of `X`

, `Y`

, and `a`

.
This curious inversion probably reflects the influence of Matlab on
the world of numerical computing. However, this convention is used
consistently in the Matplotlib plotting module discussed in the next
section, so there is no getting away from it. It does have the virtue
that the first argument of meshgrid (the x axis by convention) is
the dimension that corresponds to the horizontal axis in Matplotlib.
However, it may not always represent the best way to organize
dimensions in numerical computing.

Some of the arrays created above contain integers and some contain
floating point values. NumPy arrays can be made up of a variety of
different numerical types, though all elements of a given array must
be of the same type. The default float type in Python contains 64
bits (like a C-language *double*) and the default integer type
generally contains 32 or 64 bits, depending on the architecture of the
underlying computer. The type of the elements in an array can be
defined upon array creation using the above functions by including the
positional parameter *dtype*:

>>> a = array([1, 2], dtype = float64) >>> a array([ 1., 2.]) >>> a = ones((2,3), dtype = complex128) >>> a array([[ 1.+0.j, 1.+0.j, 1.+0.j], [ 1.+0.j, 1.+0.j, 1.+0.j]]) >>> a = array([1.3, 2.7], dtype = int32) >>> a array([1, 2]) >>>

The data types most useful for numerical work are int8, int16, int32, int64, float32, float64, complex64, complex128. The number indicates the number of bits, with, for instance, complex64 consisting of two float32s.

It is easy to determine the the data type in NumPy arrays, as this type is an attribute of of these objects:

>>> a = array([[2.1, 3.5, 7.0], [10.4, 12.9, 5.1]]) >>> a array([[ 2.1, 3.5, 7. ], [ 10.4, 12.9, 5.1]]) >>> a.dtype dtype('float64') >>> b = arange(0,5) >>> b array([0, 1, 2, 3, 4]) >>> b.dtype dtype('int32') >>> c = array([0 + 1j, 2 + 4j]) >>> c.dtype dtype('complex128') >>>

Generally the data type of array elements is obvious when the arrays
are created by the simple methods illustrated in this section.
However, if you are confronted with a “mystery array”, the dtype
attribute allows you obtain its type. (Notice, by the way, that even
though integers are used for the real and imaginary parts of the
complex data elements in `c`

above, the type still comes out
“complex”, which is composed of two floats. In other words, an
automatic conversion from int to float takes place.)

It is possible to convert arrays from one type to another using
the *astype* method:

>>> from numpy import * >>> a = arange(0,5) >>> a.dtype dtype('int32') >>> b = a.astype('float32') >>> b array([ 0., 1., 2., 3., 4.], dtype=float32) >>> c = a.astype('float64') >>> c array([ 0., 1., 2., 3., 4.]) >>>

In many cases data are imported in the form of an array from some outside source. In this case one need not create the array “by hand”.

Individual elements and sets of elements are extracted from an array by indexing. NumPy adopts and extends the indexing methods used in standard Python for strings and lists.

>>> a = array([[2.1, 3.5, 7.0], [10.4, 12.9, 5.1]]) >>> a array([[ 2.1, 3.5, 7. ], [ 10.4, 12.9, 5.1]]) >>> a[0,0] 2.1000000000000001 >>> a[0,1] 3.5 >>> a[1,0] 10.4 >>> a[0,-1] 7.0 >>>

The above examples show how to extract single elements as in standard Python. The extension is that since NumPy arrays can be multi-dimensional, a list of N indices (really, a tuple) is needed for an N-dimensional array. As in standard Python, indexing starts at 0 and negative indices index backwards from the end of the array, starting with -1.

Array elements may be assigned to scalars and vice versa. Starting
with the array `a`

defined above:

>>> b = a[1,2] >>> b 5.0999999999999996 >>> a[1,2] = 137.25 >>> a array([[ 2.1 , 3.5 , 7. ], [ 10.4 , 12.9 , 137.25]]) >>>

(Notice above another example of the difficulty of representing decimal numbers in binary floating point form!)

The slicing methods used in Python strings and lists also work for NumPy arrays:

>>> a = array([[2.1, 3.5, 7.0], [10.4, 12.9, 5.1]]) >>> a array([[ 2.1, 3.5, 7. ], [ 10.4, 12.9, 5.1]]) >>> b = a[:,1:2] >>> b array([[ 3.5], [ 12.9]]) >>>

The index “`:`

” returns all elements along the corresponding
axis. Thus, `a[:,1:2]`

creates an array consisting of the second
column (column 1 with zero-based indexing) and all rows of array
`a`

.

Note that if the number of colons is less than the number of dimensions, then all elements of the remaining dimensions are included:

>>> a = zeros((2,2)) >>> a array([[ 0., 0.], [ 0., 0.]]) >>> a[:] array([[ 0., 0.], [ 0., 0.]]) >>>

Thus, as illustrated above, `a[:]`

indicates the entire array
regardless of the number of dimensions.

Also note that slicing may be used on an array on the left side of the equals sign to facilitate assignment to part of that array:

>>> a array([[ 0., 0.], [ 0., 0.]]) >>> b = array([1.,2.]) >>> b array([ 1., 2.]) >>> a[0,:] = b >>> a array([[ 1., 2.], [ 0., 0.]]) >>>

Here are some other useful tricks in slicing:

>>> a = arange(0,10) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> a[0:10:2] array([0, 2, 4, 6, 8]) >>> a[-1:0:-1] array([9, 8, 7, 6, 5, 4, 3, 2, 1]) >>> a[-1::-1] array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) >>>

The first example illustrates the use of a non-default step of 2, while the last two show how to reverse the order of the elements in an array. The second of these demonstrates that by omitting the stop value, all elements up to the beginning of the array (or the end if the start value is positive) are included in the new view.

Here is a surprise:

>>> a = array([[2.1, 3.5, 7.0], [10.4, 12.9, 5.1]]) >>> b = a[:,1:2] >>> b array([[ 3.5], [ 12.9]]) >>> a[0,1] = 14.5 >>> b array([[ 14.5], [ 12.9]]) >>>

As with lists, the assignment of one array to another, e. g.,
`c = a`

, doesn’t create a new array, just an alias, `c`

, to
the original array `a`

. However, NumPy carries this farther.
When an array is sliced, i. e., when a subset of it is defined as
illustrated above, a new array is not created. Instead, a new
*view* of the original array is created. Thus, when the original
array `a`

is changed, the new view, `b`

is also changed.
This is analogous to the aliases which occur for lists and dictionaries.

Note that aliasing does not happen with the simple assignment of an array element to a scalar. Contemplate the meaning of the following:

>>> a = array([[1, 2], [3, 4]]) >>> a array([[1, 2], [3, 4]]) >>> b = a[0:1,1:2] >>> b array([[2]]) >>> c = a[0,1] >>> c 2 >>> a[0,1] = 42 >>> b array([[42]]) >>> c 2 >>>

If you really need to make a new array out of a slice of the original
array, use of the “`+`

” prefix causes this to happen:

>>> a = arange(0,10) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> b = +a[1:3] >>> b array([1, 2]) >>> a[1] = 42 >>> a array([ 0, 42, 2, 3, 4, 5, 6, 7, 8, 9]) >>> b array([1, 2])

This is because the plus sign in the expression on right side of the equals sign technically signals computation, which might change element values, though of course it doesn’t actually do so in this case. Sometimes the “literal-mindedness” of computers is an advantage!

NumPy arrays have a *reshape* method which allows the number of
dimensions and the size of each dimension to be changed as long as
the total number of array elements remains the same:

>>> a = array([[1,2],[3,4]]) >>> a array([[1, 2], [3, 4]]) >>> a.reshape(4) array([1, 2, 3, 4]) >>> a.reshape(4,1) array([[1], [2], [3], [4]]) >>> a.reshape(1,4) array([[1, 2, 3, 4]]) >>> a.reshape(2,2,1) array([[[1], [2]], [[3], [4]]]) >>>

Reshape has been used three different ways in the above examples. Starting with a two-dimensional array with 2 × 2 elements, we have (1) converted this into a one-dimensional array of 4 elements, (2) a two-dimensional array of 4 × 1 elements, (3) a two-dimensional array of 1 × 4 elements, and (4) a three-dimensional array of 2 × 2 × 1 elements.

As with slicing, reshaping doesn’t normally create a new array, just a new view of the original array:

>>> a = array([[1,2],[3,4]]) >>> a array([[1, 2], [3, 4]]) >>> b = a.reshape(4) >>> a[0,1] = 137 >>> a[1,0] = 42 >>> b array([ 1, 137, 42, 4]) >>>

Reshaping brings up a sticky question. When reshaping
`[[1, 2], [3, 4]]`

into `[1, 2, 3, 4]`

, why doesn’t it come
out `[1, 3, 2, 4]`

? The answer is that by default NumPy iterates
the last index in a multi-dimensional array most rapidly and the first
index least rapidly while assigning elements to successive memory
locations. Thus, the order in which the elements appear is
`a[0,0]`

, `a[0,1]`

, `a[1,0]`

, `a[1,1]`

in the
above example. As we discovered above, this ordering is revealed in
the conversion of a multi-dimensional array to a one-dimensional
array, and is the same as that used by the C language. The Fortran
language uses the opposite convention; the first index is iterated
most rapidly. It is possible to make NumPy behave like Fortran in
this regard, but special options and methods must be invoked.

NumPy arrays are Python sequences, which means that *for* loops
can be used to iterate over them. For example:

>>> a = arange(12).reshape(4,3) >>> a array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) >>> for x in a: ... x ... array([0, 1, 2]) array([3, 4, 5]) array([6, 7, 8]) array([ 9, 10, 11]) >>>

This example reveals that a two-dimensional NumPy array is actually an
array of arrays, so iterating over `a`

doesn’t yield the scalar
array elements in sequence. Instead this loop accesses in sequence
the subarrays from which the array `a`

is constructed. As a
result, `len(a) = 4`

rather than `12`

and
`a[0] = array([0, 1, 2]`

!

To get the scalar elements, two loops are required, one over the array of subarrays and the other over the subarrays themselves:

>>> for x in a: ... for y in x: ... y ... 0 1 2 3 4 5 6 7 8 9 10 11 >>>

The generalization to arrays of more than two dimensions is obvious.

To access individual array elements of a multi-dimensional array in a
single loop, use the *ravel* method, which is equivalent to
reshaping the array into one-dimensional form:

>>> for y in a.ravel(): ... y ... 0 1 2 3 4 5 6 7 8 9 10 11 >>>

Generally speaking, iterating over the elements of a NumPy array in Python should be avoided where possible, as it is computationally inefficient due to the interpreted nature of the Python language. As we shall see, there are many NumPy array methods and functions which reduce the necessity for such explicit iteration.

In this section we begin to learn how to use NumPy arrays. Array expressions and assignment statements work much like standard scalar math:

>>> a = array([[1,2],[3,4]]) >>> b = array([[5,6],[7,8]]) >>> c = a + b >>> c array([[ 6, 8], [10, 12]]) >>> d = a*b >>> d array([[ 5, 12], [21, 32]]) >>> e = a*b + c >>> e array([[11, 20], [31, 44]]) >>> f = c + a*b >>> f array([[11, 20], [31, 44]]) >>>

As the above example shows, array math operates on an element by element basis, i. .e, unlike matrix math. The usual algebraic precedence rules apply, e. g., multiplication is done before addition in otherwise ambiguous situations.

Because array math occurs element by element, arrays normally have to be the same shape for math operations between them to succeed:

>>> a = array([[1,2],[3,4]]) >>> e = array([10, 20, 30]) >>> a + e Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape >>>

The shape of array `a`

is (2,2),while the shape of array
`e`

is (3,), and they therefore cannot be added together.

The exception occurs when one of the arrays is the same shape as the other except that the size of one or more of the dimensions is reduced to 1, or else absent completely:

>>> a = array([[1,2],[3,4]]) >>> a.shape (2, 2) >>> b = array([[5,10]]) >>> b.shape (1, 2) >>> a + b array([[ 6, 12], [ 8, 14]]) >>>

In this case array `b`

, which has one row and two columns, is
(conceptually) extended along axis 0, repeating values as necessary,
to create an array with two rows `array([[5,10],[5,10]])`

and
then added to `a`

. This extension process is called
*broadcasting*.

Consider the alternative case where an array `c`

has two rows
and one column:

>>> c = array([[5],[10]]) >>> c.shape (2, 1) >>> a + c array([[ 6, 7], [13, 14]]) >>>

In this case array `c`

is extended along axis 1, to create an
array with two rows `array([[5,5],[10,10]])`

, which is then
added to `a`

. Notice that the result here is different than that
in the previous case, since the broadcast axis is different.

A case of double broadcasting is illustrated below, where arrays
`b`

and `c`

are as shown above:

>>> b + c array([[10, 15], [15, 20]]) >>>

The number of dimensions of the two arrays need not be the same. For example:

>>> d = array([5,10]) >>> d.shape (2,) >>> a + d array([[ 6, 12], [ 8, 14]]) >>>

In this case a new axis of size 1 is first added onto the beginning of
the initial shape (2,), resulting in the shape (1,2), which is the
same as that of array `b`

. Notice that this can be done
unambiguously, since the number of array elements does not change.
Then broadcasting occurs along the new axis, as is done when computing
`a + b`

. However, cases like this leave one wondering whether
the result is what one intended. Just to be safe, it is probably
better to reshape array `d`

explicitly the way one wants it
before adding it to `a`

, i. e.,

>>> dd = d.reshape(1,2) >>> a + dd array([[ 6, 12], [ 8, 14]]) >>>

or

>>> dd = d.reshape(2,1) >>> a + dd array([[ 6, 7], [13, 14]]) >>>

Notice that combining a scalar with an array, for instance by addition or multiplication, is a special case of broadcasting in which one of the components (the scalar) has fewer dimensions than the other. In this case there is no ambiguity in the broadcasting as all dimensions are filled with the scalar value and there is no reason to avoid this type of construction.

Since NumPy arrays are Python objects, they have methods associated with them. Here we describe some of the most useful methods. Some methods operate on the array in place and don’t return anything. Others return some object, generally a modified form of the original array. Which does which is indicated.

- tolist(): This method converts a NumPy array to an
ordinary Python list:
>>> a = arange(0.,9.5) >>> a array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> a.tolist() [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] >>> b = a.reshape(2,5) >>> b array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.]]) >>> b.tolist() [[0.0, 1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0, 9.0]] >>>

This script also reminds us about the*reshape*method, which was discussed in the last section. - tofile(file=, sep=”, format=”): This writes an array to
a file.
*File*is either a string representing a file name, in which case this method opens this file and writes the array to it, or it is the file object for a previously opened file. If the optional keyword parameters*sep*and*format*are absent, the array is written in native binary format:>>> a = array([[1., 2., 3.], [9., 8., 7.]]) >>> a.tofile("junk.bin")

No meta-information about the array is written – just the array element values, as can be inferred by the size of junk.bin; 48 bytes. The parameter sep defines an ascii string (for instance, a space or a tab) to separate the array elements. The elements are converted to ascii string format for writing. If the format command is used, then the elements are written in the specified format, e. g., “%f”, “%g”, or “%d”. For example, using the same array:a.tofile(sys.stderr, sep = " ") 1.0 2.0 3.0 9.0 8.0 7.0>>>

The Python prompt appears after the output because no newline is appended to the file. - astype(): This method converts the type of elements to the type listed in the argument as discussed previously.
- byteswap(): This method swaps bytes as previously discussed. If the argument is False (the default), bytes are swapped in place. If True, the method returns a new array with the swapped bytes.
- copy(): This returns a copy of the array.
- fill(scalar): This fills all of the elements of the array
with the scalar value and is faster than element-by-element
assignment:
>>> a = array([1,4,7]) >>> a array([1, 4, 7]) >>> a.fill(42) >>> a array([42, 42, 42]) >>>

- reshape(newshape): As noted previously, reshape changes the number and size of dimensions of an array, returning a new array (actually a new view – additional data memory isn’t allocated). The new shape is given by the newshape tuple. The total size of the array cannot change.
- transpose(tuple of axes), T: This method transposes arrays
of two and higher dimensions:
>>> a = array([[1, 2], [3, 4]]) >>> a array([[1, 2], [3, 4]]) >>> a.transpose() array([[1, 3], [2, 4]]) >>>

For arrays of greater than two dimensions a tuple of axes can be used to show how the existing axes should be ordered in the returned array.`a.T`

is shorthand for`a.transpose()`

with no arguments. - sort(axis=-1): This sorts the elements along the defined
dimension, with the default being the last (-1). Dimension
numbering starts with 0:
>>> a = array([3, 7, 4, 8, 2, 15]) >>> a array([ 3, 7, 4, 8, 2, 15]) >>> a.sort() >>> a array([ 2, 3, 4, 7, 8, 15]) >>>

- max(axis=None): For no arguments, this returns the scalar
value which is the largest element in the entire array. If an axis
is defined for an N-dimensional array, the maximum values along that
axis are returned as an array with N - 1 dimensions:
>>> a = array([[1, 2], [3, 4]]) >>> a array([[1, 2], [3, 4]]) >>> a.max() 4 >>> a.max(0) array([3, 4]) >>> a.max(1) array([2, 4]) >>>

- min(): Works analogously to max().
- clip(min=, max=): Returns an array in which elements larger than the value assigned to max are set to that value and those smaller than min are set to min.
- conj(): Returns an array with all (complex) elements conjugated.
- trace(): Returns the trace of a two-dimensional array.
With optional arguments, works for higher-dimensioned arrays as
well, but this is probably not too useful.
>>> a array([[1, 2], [3, 4]]) >>> a.trace() 5 >>>

- sum(axis = None): With no argument, returns the sum of
all the elements in the array. With an axis argument, sums along
the specified axis, returning an array of N - 1 dimensions for an
N-dimensional initial array:
>>> a array([[1, 2], [3, 4]]) >>> a.sum() 10 >>> a.sum(axis = 0) array([4, 6]) >>> a.sum(axis = 1) array([3, 7]) >>>

- mean(axis = None): Like sum() except computes the mean.
- var(axis = None): Like sum() except computes the variance.
- std(axis = None): Like sum() except computes the standard deviation.
- prod(axis = None): Like sum() except computes the product.
- cumsum(): With no argument, reshape the array to a single
dimension and perform a cumulative sum along that dimension. With
an optional axis, perform cumulative sums along that axis.
>>> a array([[1, 2], [3, 4]]) >>> a.cumsum() array([ 1, 3, 6, 10]) >>> a.cumsum(axis = 0) array([[1, 2], [4, 6]]) >>> a.cumsum(axis = 1) array([[1, 3], [3, 7]]) >>>

- cumprod(): Like cumsum() but computes the cumulative product.

Some of the most useful functions we haven’t discussed previously are now described. In certain cases obscure options are omitted in this documentation for simplicity. Also, many NumPy functions are omitted because they reproduce the functionality of NumPy methods discussed above.

- (num, bins) = histogram(x, bins=None, range=None): This
function returns an array in num (a histogram) which contains the
number of values of the array x sorted into bins defined by the
optional keyword arguments bins and range. The argument range is a
tuple containing the minimum and maximum values of x that are
accepted – values outside this range are discarded. Normally the
argument bin contains a scalar giving the number of bins, though it
can also be a tuple giving the desired boundaries of the bins. The
return argument bin is an array containing the bounding values of
the bins.
>>> a = log2(arange(1.,50.,0.1)) >>> (num, bins) = histogram(b, bins = 10, range = (0,6)) >>> bins array([ 0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4, 6. ]) >>> num array([ 6, 7, 12, 18, 27, 42, 62, 95, 144, 77]) >>>

- (num, xedges, yedges) = histogram2d(x, y, bins=10,
range=): This function returns a two-dimensional histogram (num)
and the edges of the bins in the x and y directions resulting from
the data in the arrays x and y. By default 10 bins are created in
each dimension and the range spans the data values. This may be
changed by assigning values to bins and range, e. g.,
`bins=(12, 8)`

and`range=((-1, 11), (-2, 12))`

. (Either tuples or lists suffice here.)>>> x = linspace(0,10,100) >>> y = x*x/10 >>> (num, xedges, yedges) = histogram2d(x,y) >>> num array([[ 10., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 10., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 10., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 2., 8., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 5., 5., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 5., 5., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 3., 7., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 1., 6., 3., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 3., 6., 1., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 4., 6.]]) >>> xedges array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) >>> yedges array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) >>>

- c = correlation(x, y, mode=’valid’): This routine returns
the one-dimensional array c containing the cross correlation between
the one-dimensional arrays x and y. This is defined

where the value of n0 and the range of the summation depend on the mode selected, which may bec[n + n0] = ∑ i x[i] y[n + i] (3.1) *valid*,*same*, or*full*. The input arrays need not be the same size. Suppose that the size of array x is X and the size of array Y is y. The array x and the shifted array y will not completely overlap in general. Sums at points beyond the limits of either array are eliminated. The option*valid*(the default) limits n to the case in which the smaller array completely overlaps the larger array after being shifted by n points. The size of c in this case is |X - Y| + 1. Notice that for input arrays of equal size, this means that c has a size of 1. For the*same*option, shifts are allowed up to the point at which the size N of array c is the larger of X and Y. For the*full*option, shifts are allowed which result in minimal (i. e., one point) overlap, so N = |X + Y| - 1. The value of n0 if x and y have the same length is n0 = N/2 for all options, including the effects of truncation in integer division. This means that for an odd length n0 refers to the element at the center of the output array, e. g., n0 = 5/2 = 2. For an even length, it refers to the element just right of center, e. g., n0 = 4/2 = 2. If the input arrays are of differing length, the proper value of n0 is in general hard to predict. However, if the arrays are both of odd length, the output is always of odd length and n0 refers to the center element in all cases. No complex conjugation is done on either input array if the arrays are complex. - c = convolve(x, y, mode=’full’): The convolution function
is identical to the correlation function except that the calculation
returns

and the default mode isc[n + n0] = ∑ i x[i] y[n − i] (3.2) *full*rather than*valid*. Some simple examples:>>> convolve([1, 0, 0], [0, 0, 1], mode='full') array([0, 0, 1, 0, 0]) >>> correlate([1, 0, 0], [0, 0, 1], mode='full') array([1, 0, 0, 0, 0]) >>> correlate([0, 0, 1], [1, 0, 0], mode='full') array([0, 0, 0, 0, 1]) >>>

A special subtype of a two-dimensional NumPy array is a *matrix*.
A matrix is like an array except that matrix multiplication (and
exponentiation) replaces element-by-element multiplication. Matrices
are generated by the *matrix* function, which may also be
abbreviated *mat*:

>>> a = matrix([[1,3,-5],[3,4,2],[-5,2,0]]) >>> a matrix([[ 1, 3, -5], [ 3, 4, 2], [-5, 2, 0]]) >>> b = matrix([[1],[5],[3]]) >>> b matrix([[1], [5], [3]]) >>> a*b matrix([[ 1], [29], [ 5]]) >>>

Notice that row and column vectors are represented as two-dimensional matrices with one row or column.

Matrices have methods beyond those of ordinary arrays:

>>> b.T matrix([[1, 5, 3]]) >>> b.H matrix([[1, 5, 3]]) >>> c = a.I >>> c matrix([[ 0.02439024, 0.06097561, -0.15853659], [ 0.06097561, 0.15243902, 0.10365854], [-0.15853659, 0.10365854, 0.0304878 ]]) >>> a*c matrix([[ 1.00000000e+00, -5.55111512e-17, -6.93889390e-18], [ 0.00000000e+00, 1.00000000e+00, 4.16333634e-17], [ 0.00000000e+00, 0.00000000e+00, 1.00000000e+00]]) >>>

The operation `b.T`

produces the transpose while `v.H`

produces the conjugate transpose. The matrix inverse is produced
by the operation `a.I`

.

The *linalg* package has a variety of common matrix operations,
such as computing determinants,

>>> linalg.det(a) -164.0 >>>

solving linear equations,

>>> d = linalg.solve(a,b) >>> d matrix([[-0.14634146], [ 1.13414634], [ 0.45121951]]) >>> a*d matrix([[ 1.], [ 5.], [ 3.]]) >>>

and finding eigenvalues and eigenvectors:

>>> e = linalg.eig(a) >>> e[0] array([-5.78304165, 6.23396835, 4.5490733 ]) >>> e[1] matrix([[ 0.65072855, -0.7001856 , -0.29375583], [-0.33849942, -0.61380708, 0.71320335], [ 0.67968412, 0.3646656 , 0.6364342 ]]) >>> u = e[1] >>> u.T*a*u matrix([[ -5.78304165e+00, -1.22688318e-15, -7.92985469e-16], [ -1.62554432e-15, 6.23396835e+00, 1.43223107e-15], [ -7.68916181e-16, 1.83533744e-15, 4.54907330e+00]]) >>>

The *eig* function returns a tuple, the first element of which is
an array of the eigenvalues, the second which is a matrix containing
the eigenvectors. Notice that the eigenvalue array may also be converted
into a diagonal matrix using the *diag* function:

>>> matrix(diag(e[0])) matrix([[-5.78304165, 0. , 0. ], [ 0. , 6.23396835, 0. ], [ 0. , 0. , 4.5490733 ]]) >>>

The result of the diag function is a normal NumPy array, but the
*matrix* function can be used to convert this into a matrix.

Sometimes regions of data on a grid are missing. For instance meteorological radar measurements lack data where there are no clouds. NumPy and Matplotlib provide ways of dealing with such situations. The tool used is NumPy’s masked array.

To create a masked array from a normal array, use the *ma.array*
function:

>>> a array([[ 0., 1., 4., 9.], [ 1., 2., 5., 10.], [ 4., 5., 8., 13.], [ 9., 10., 13., 18.]]) >>> m = a > 10. >>> amasked = ma.array(a,mask=m) >>> amasked masked_array(data = [[0.0 1.0 4.0 9.0] [1.0 2.0 5.0 10.0] [4.0 5.0 8.0 --] [9.0 10.0 -- --]], mask = [[False False False False] [False False False False] [False False False True] [False False True True]], fill_value = 1e+20) >>>

We desire to mask the array `a`

(created by means not shown) for
element values exceeding 10. We first create a boolean *mask*
with the command `m = a > 10.`

We then use this mask as the value
of the keyword argument `mask`

. Printing out the resulting
masked array `amasked`

reveals that this object consists of the
data array itself with the masked values blanked out, the boolean
array with `True`

values occurring where the original array
elements exceeded 10, and the *fill value*, which is the actual
value assigned to masked elements of the data array. If a different
fill value is desired, this may be assigned via a keyword parameter
in the `ma.array`

call, e. g., `fill_value=1e30`

.

Computations are not done on masked values in NumPy calculations. Furthermore, various other packages such as the Matplotlib graphics package (discussed in the next chapter) recognize masked regions and act appropriately.

The unmasked data in a masked array may be recovered as follows:

>>> aunmasked = amasked.data

However, this can lead to unsatisfactory results, as there is no
reliable indication as to which data elements were masked, especially
after mathematical operations between masked arrays. The solution is
to use the *ma.filled* function, which sets masked elements to a
specified *fill value* in the unmasked version. For instance:

>>> fill_value = 1.e30 >>> aunmasked = ma.filled(amasked, fill_value) >>> aunmasked array([[ 0.00000000e+00, 1.00000000e+00, 4.00000000e+00, 9.00000000e+00], [ 1.00000000e+00, 2.00000000e+00, 5.00000000e+00, 1.00000000e+01], [ 4.00000000e+00, 5.00000000e+00, 8.00000000e+00, 1.00000000e+30], [ 9.00000000e+00, 1.00000000e+01, 1.00000000e+30, 1.00000000e+30]]) >>>

One ugly, but potentially important issue can arise with the transfer of data between computers. Different computer architectures store the bytes in multi-byte numbers in different orders. Intel-compatible machines store the least signficant byte first, and are called little-endian. Various other architectures, most notably the Sparc architecture from Sun Microsystems, store the most significant byte first, and are called big-endian. The only time this architectural detail is of importance is when data are moved from one machine type to another. In order to make sense out of data coming from a machine of the opposite type, we must reverse the byte order in the data. This is done with the byteswap method. To show what happens when bytes are in the wrong order, lets do a byteswap on an array:

>>> a = arange(0,5) >>> a array([0, 1, 2, 3, 4]) >>> a.byteswap() array([ 0, 16777216, 33554432, 50331648, 67108864]) >>>

It is possible to detect the endianness of the machine on which you are working using a routine from the sys module:

>>> import sys >>> sys.byteorder 'little' >>>

The NumPy home page provides some useful information on NumPy: http://numpy.scipy.org/. The User Guide is rather incomplete at this point, but the NumPy Reference is quite useful. There is also a separate Tutorial, and the source code may be downloaded. Travis Oliphant’s Guide to NumPy http://www.tramy.us/ is a complete reference work, over 300 pages of PDF. It is the definitive work, but daunting for a beginner. The Numerical Python page http://numpy.sourceforge.net/numdoc/HTML/numdoc.htm is out of date, but may still be useful in certain respects.

Version 1.13 of NumPy is documented here. Changes (for the better) were made to the histogram function between 1.11 and 1.13.