Previous Up Next

Chapter 2  Standard Python

We touch first on those aspects of “standard” Python (including “standard” modules) which we need to do scientific work. Standard Python is a clean but powerful language and is relatively easy to learn.

2.1  Using Python as a Desk Calculator

As noted above, Python is an interpreted language, which means that you can type code directly into it and get the result immediately. This makes Python an excellent desk calculator. Start the Python interpreter at your prompt by typing python:

swallow$ python
Python 2.5.2 (r252:60911, Jan  4 2009, 17:40:26)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

The prompt >>> tells you that python is ready for input. Type in any numeric expression and you should get an answer. For instance

>>> 3*(2 + 5.5) 
22.5 
>>>

The number 22.5 is the result of the expression. To finish with Python, type ^D or quit(). The ^D means hold down the control key and type d.

>>> quit()
swallow$

Here is one quirk of Python. Let’s try division:

>>> 5/3 
1 
>>> 5%3 
2 
>>> 4./3. 
1.3333333333333333 
>>> 

Notice that the ratio of integers yields an integer, not the (possibly expected) floating point result. To do floating point division you need at least one of the numbers to be floating point to begin with. The % operator provides the remainder in integer division.

2.1.1  The math module

Let’s try some trig:

>>> sin(0.01)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sin' is not defined
>>>

Oops! Something went wrong. To use trigonometric or other common math functions, you need to load the math module:

>>> from math import *
>>> sin(0.01)
0.0099998333341666645
>>> 

The * loads all the functions defined in the math module into the interpreter. Now your common math functions work:

>>> exp(3.)
20.085536923187668
>>> log(3.)
1.0986122886681098
>>> log10(3.)
0.47712125471966244
>>> abs(-2.)
2.0
>>> atan(1.)
0.78539816339744828
>>> atan2(-3.,-4.)
-2.4980915447965089
>>> 

If the math module were loaded with the import math statement rather than from math import *, the math functions would all need the prefix math. to work:

>>> import math
>>> log(3.)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'log' is not defined
>>> math.log(3.)
1.0986122886681098
>>> 

Note that angles are in radians, as in compiled languages such as C and Fortran. Also, by default, floating point numbers (i. e., numbers with decimal points) are “double precision” or 64 bits. Integers depend on the architecture of the machine being used, but are at least 32 bits. Floats and ints can be converted explicitly from one to the other:

>>> float(3)
3.0
>>> int(3.1)
3
>>> int(-3.1)
-3
>>> 

Conversion to ints rounds toward zero for both positive and negative floats on this computer, but this is not guaranteed since it depends on computer hardware.

2.1.2  Variables

Sometimes one wants to use a certain number repeatedly. In this case it is possible to assign it to a variable. So, instead of typing

>>> 2*23.5
47.0
>>> 3*23.5
70.5
>>> 4*23.5
94.0
>>>

do instead

>>> const = 23.5
>>> 2*const
47.0
>>> 

etc.

By the way, the last value returned by the interpreter may be retrieved, as it is stored in the variable _:

>>> "frog"
'frog'
>>> _
'frog'
>>> 

2.2  Scripts

Sometimes one would like to package a set of commands to Python so that repeated typing of these commands is not needed. This can be done by creating a Python script. Using your favorite text editor, create a file called trivial.py with the following content:

# trivial1.py -- This is a trivial script.
a = 3.
result = a**3 + 2.*a**2 - 5.
print (a, result)

Then type python trivial1.py at the command line to execute the statements in this script, just as if you typed them directly into Python:

swallow$ python trivial1.py
3.0 40.0

The print statement prints out the values of the variables a and result. The operator ** raises the variable before it to the power following it. Note everything following a # is treated as a comment. Comments are there for humans to read – Python ignores them.

You could make the output more informative by adding some descriptive information to the print statement:

print ('for a =', a, '   a**3 +2a**2 - 5 =', result)

yields

 for a = 3.0    a**3 +2a**2 - 5 = 40.0

The modified print statement introduces a new data type in Python; any text enclosed between (single or double) quotes is called a string. More on this later.

2.2.1  Getting input

Actually, this script is pretty useless, since to get a different result, one would have to edit the script to change the value of a. However, a slight modification causes the script to input a value of a from the command line each time it is run:

# trivial2.py -- This is a trivial script.
import sys
print ('type a: ',)
text = sys.stdin.readline()
a = float(text)
result = a**3 + 2.*a**2 - 5.
print ('for a =', a, '   a**3 +2a**2 - 5 =', result)

Running this results in

swallow$ python trivial2.py
type a: 4
 for a = 4.0    a**3 +2a**2 - 5 = 91.0
swallow$ 

where the 4 is typed in response to the prompt type a: and stored as a string in the variable text. The call to the function float() converts this string to a float variable. The readline command returns an empty string (i. e., '') when the end of the file is reached.

Information is read by the system on the standard input, which necessitates importing the module sys to enable this capability. Notice that we have used the import form import sys rather than the form we used earlier for the math module, which would have been from sys import *. Which form is used depends on whether one wants to include the name of the module as a prefix to calls to module functions or not, e. g., sys.stdin versus stdin. The advantage of the latter is less typing. The advantage of the former is that a possible pre-existing function stdin doesn’t get clobbered.

The stdin method of sys has many sub-methods, of which readline() is just one. Each call reads an entire line of typing. The sub-method readlines() grabs all lines in the form of a list of lines:

linelist = sys.stdin.readlines()
nlines = len(linelist)
print (linelist[0])
print (linelist[1])
...
print (linelist[nlines - 1])
print (linelist[-1])

The list is a compound Python data type. The list can hold a bunch of individual elements consisting of constants or variables, including other lists. The number of elements is returned by the function len(), as shown above. The elements of a list can be accessed by the square bracket notation illustrated above. Indexing starts at zero and the last element of the list has an index equal to the number of elements minus one. The last element can also be indexed by -1, the second to the last by -2, etc. This indexing convention holds for compound objects throughout Python.

2.2.2  Making scripts executable

If you get tired of typing

swallow$ python trivial2.py

you can add the line

#!/usr/bin/python

to the beginning of the script and then make the script executable by typing

swallow$ chmod +x trivial.py

at the system prompt. Then the script can be run by simply typing

swallow$ trivial2.py

Change /usr/bin/python to whatever the full path is for python on your system.

An advantage of an executable Python script is that it presents an alternate way of entering input data into the script. For example, we can rewrite our trivial script as follows:

#!/usr/bin/python
# trivial3.py -- This is a trivial script.
import sys
print (sys.argv)   # Included so we can see what is going on.
a = float(sys.argv[1])
result = a**3 + 2.*a**2 - 5.
print ('for a =', a, '   a**3 +2a**2 - 5 =', result)

The variable sys.argv from the sys package is a list of the words typed on the command line, starting with the command itself. Several examples follow:

swallow$ trivial3.py
['./trivial3.py']
Traceback (most recent call last):
  File "./trivial3.py", line 5, in <module>
    a = float(sys.argv[1])
IndexError: list index out of range

In this case just the command is typed, and the returned list just contains the command as its first and only element. Indexing in Python starts from zero, so there is no sys.argv[1], a fact noted and commented upon by the Python interpreter. Let us try instead

swallow$ trivial3.py 3
['./trivial3.py', '3']
for a = 3.0    a**3 +2a**2 - 5 = 40.0

The script converts the string '3' into the corresponding floating point number and proceeds as before. If we type

swallow$ trivial3.py "frogmorton"
['./trivial3.py', 'frogmorton']
Traceback (most recent call last):
  File "./trivial3.py", line 5, in <module>
    a = float(sys.argv[1])
ValueError: invalid literal for float(): frogmorton

the float conversion becomes problematic and Python again reports its distress. However, if we type

swallow$ trivial3.py 3 "frogmorton"
['./trivial3.py', '3', 'frogmorton']
for a = 3.0    a**3 +2a**2 - 5 = 40.0

Python is happy even though there is an extra command line argument which remains unused. Note that the print statement in this script is not really needed; it is there only to make it clear how the command line arguments are transmitted to the script.

2.2.3  Operating system functions

Various connections with the underlying operating system are available through the os module. The most interesting of these functions is the ability to run arbitrary commands in a sub-shell. For example:

>>> import os
>>> os.system("date")
Sat Oct  3 17:22:19 MDT 2009
0
>>> 

The output of the command is printed and the exit status of the system call is returned.

2.3  Python Functions

In this section we begin to consider Python statements which control branching, looping and modularization of code. An important point to remember here is that Python uses indentation of code to define groups of related statements, or clauses. Less indentation ends the clause. Furthermore, the statements which introduce clauses always end with a colon. Examples will clarify this as we go along. It is better to use tabs rather than spaces for indentation as it is easier to make the indentation consistent.

2.3.1  Functions in modules

A function is a clause which can be called by the user or by other functions to execute a particular task. Functions can be defined interactively, but they are most useful when placed in special files called modules, which can be imported by the interpreter. Let’s take as an example a function which solves the quadratic equation

ax2+bx+c=0.     (2.1)

We all know that this equation has two solutions

x=
b±(b2−4ac)1/2
2a
.     (2.2)

Consider the following script, named quadratic1.py:

# quadratic1.py -- Quadratic equation solver.
def qsolns(a,b,c):
    temp = b**2 - 4.*a*c
    x1 = (-b + temp**0.5)/(2.*a)
    x2 = (-b - temp**0.5)/(2.*a)
    return [x1, x2]

This script contains the definition of a single function qsolns. The three variables a, b, and c are transmitted to the body of the function, which computes the two solutions x1 and x2. The two solutions are then combined into a list [x1, x2] and returned to the caller. To use this script, which is called a module, We use it from within Python as follows:

>>> import quadratic1
>>> quadratic1.qsolns(1.,4.,0.)
[0.0, -4.0]
>>> 

Direct substitution verifies 0.0 and -4.0 as solutions to the original equation.

2.3.2  Keyword parameters

There is another way to call the function qsolns:

>>> quadratic1.qsolns(a = 1., b = 4., c = 0.)
[0.0, -4.0]
>>> 

This makes the relationship between the function arguments and the values assigned to them more obvious. However, it also is more verbose. The first instance uses positional parameters the second uses keyword parameters. The two methods can be mixed as long as keyword parameters follow positional parameters:

>>> quadratic1.qsolns(1., b = 4., c = 0.)
[0.0, -4.0]
>>> quadratic1.qsolns(a = 1., b = 4., 0.)
  File "<stdin>", line 1
SyntaxError: non-keyword arg after keyword arg
>>> quadratic1.qsolns(c = 0., a = 1., b = 4.)
[0.0, -4.0]
>>> 

As the last case shows, keyword parameters can appear in any order as long as all positional parameters appear first.

2.3.3  Complex data type

So far, so good. However, suppose we try the following:

>>> quadratic1.qsolns(1,4,6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "quadratic1.py", line 4, in qsolns
    x1 = (-b + temp**0.5)/(2.*a)
ValueError: negative number cannot be raised to a fractional power
>>> 

Clearly the problem is that for the choice of (a, b, c) the solutions to the quadratic equation are complex rather than real.

One solution to this problem is to make the variable temp complex:

# quadratic3.py -- Quadratic equation solver.
def qsolns(a,b,c):
    temp = complex(b**2 - 4.*a*c, 0.)
    x1 = (-b + temp**0.5)/(2.*a)
    x2 = (-b - temp**0.5)/(2.*a)
    return [x1, x2]

Running this yields

>>> import quadratic3
>>> solns = quadratic3.qsolns(1.,4.,6.)
>>> print (solns)
[(-2+1.4142135623730951j), (-2-1.4142135623730951j)]
>>> 

A complex number with a negative real part can be raised to a fractional power, the result of which is also complex. The occurence of one complex variable in an expression makes all calculations complex, including the answers. Note the use of the complex(real,imag) function to construct a complex number or variable. If c is complex, the real and imaginary parts are given by c.real and c.imag. The complex conjugate of c is given by c.conjugate(). Finally, the absolute value is computed by abs(c). For complex trig, log, and exponential functions, load the cmath module, which provides complex versions of most of the functions of the math module.

2.3.4  Scope of variables

Several comments about functions are worth making. Variables defined inside a function are generally invisible outside the function. However, variables outside the function (but in the same script or module) can be seen inside the function, but not assigned to, with one exception; if a variable is defined as global, it can be read and assigned to in both locations. Globals are generally discouraged, so we do not discuss them further here.

The values of the variables passed from the calling environment are available inside the function via the list in parentheses in the function definition line. Thus, the function qsolns has access to the values of a, b, and c.

The results of a function calculation can be accessed by the calling environment via the return statement in the function. In the case of qsolns, the list containing the roots of the quadratic equation is returned to the calling environment, as illustrated above. This highly channeled flow of information to and from a function may sound restrictive, but it is desirable from the point of view of keeping track of what is going on in a program.

Scripts can call functions defined within the script, and functions themselves can call other functions. However, for this to work, the definition of the function must occur before the call to the function – otherwise, Python doesn’t know that the function exists.

2.4  Branching

An alternate approach to handling the quadratic equation problem is to test beforehand for various possibilities that might cause difficulties with the calculation using the if statment. Let us rewrite our function as follows:

# quadratic2.py -- Quadratic equation solver.
def qsolns(a,b,c):
 temp = b**2 - 4.*a*c
 if a == 0:
  #a = 0 is a special case
  if b != 0:
   return [-float(c) / b]
  return []
 elif temp < 0:
  #solutions are complex
  return []
 elif temp == 0:
  #only one solution
  return [(-b + temp**0.5)/(2.*a)]
 else:
  x1 = (-b + temp**0.5)/(2.*a)
  x2 = (-b - temp**0.5)/(2.*a)
  return [x1, x2]

If a == 0 is true, i. e., if the variable a is equal to zero, then the clause of the if statement is executed. Alternatively, if this is false, but the elif condition is true, the clause of this statement is executed. If neither is true, the clause of the else statement is executed. This function illustrates the full range of possibilities of the if statement. For each if there are zero or more elif clauses. The else clause considers all other possibilities. Try this function to see what happens!

Comparison operators between numbers are >, >=, ==, !=, <=, and <, the meanings of which are obvious except possibly == and !=; the first yields true if the two numbers are equal and false otherwise, whereas the second yields the reverse. Don’t confuse == with = which tries to assign the right value to the left value, and is an error in this context! Note that if a single number or numerical expression is used in place of the boolean expression, it is treated as false if it evaluates to zero and true if it is nonzero. Logical combinations of boolean expressions are produced with the and and or operators, and and or. Thus, (4 > 3) or (3 < 2) is true, while (4 > 3) and (3 < 2) is false. The parentheses are not needed in this context but are helpful to avoid ambiguity.

2.5  Looping

2.5.1  the while loop

Suppose we want to find the roots of some function f(x). Newton’s method is perhaps the simplest (and least reliable!) means of doing this. If x0 is a first guess at the root, then a Taylor expansion of f(x) about this point can be written to first order as

f(x) ≈ f(x0 ) + 
df
dx
 (x − x0 ) .     (2.3)

Setting this to zero to find the root of the linear approximation to f(x) results in the equation

x = x0 − 
f(x0 )
(df/dx)
 .     (2.4)

However, this is only an approximation, so this equation has to be recalculated a number of times, replacing x0 by the new x each time until convergence occurs. Here is a simple Python function which does this:

# newton1.py -- A newton's method rootfinder.

def newton(xstart, fn):
    x = xstart
    eps = 0.0001
    xold = x + 10*eps
    delta = 0.1
    loops = 0
    maxloops = 100

    while (abs(x - xold) > eps) and (loops < maxloops):
        loops = loops + 1
        fval = fn(x)
        dfdx = (fn(x + delta) - fn(x))/delta
        xold = x
        x = xold - fval/dfdx
        print ("loop =", loops, "-- x =", x, "-- f(x) =", fn(x))

This function uses the while statement – the clause of the statement is executed while the boolean expression following while is true. In this expression x is the current value of the variable x whereas xold is the previous value, x0. As long as the absolute value of the difference between these two is greater than some small number eps, this part of the expression is true. Convergence is reached when x stops changing, which means that abs(x - xold) becomes really small. The other part simply limits the number of iterations of the loop to 100 in order to guard against a runaway loop – which is all too common with Newton’s method! (Recall that and indicates a logical and.)

The function definition has two arguments, the initial value or first guess for x and the name of the function for which we wish to find the roots. The latter must be defined separately. Notice that the initial value of xold is set far enough away from x so that the while statement executes at least once.

Let’s try out our new root finder! We define a cubic polynomial function f(x) directly in the interpreter (notice the indentation needed after the def statement and the blank line at the end of the function) and then call newton with a variety of starting values. (Also notice that the name of the function in its definition need not be the same as the name of the function within newton. This is a general property of function arguments.)

>>> from newton1 import *
>>> def f(x):
...   return x**3 - 2.*x
... 
>>> newton(0.5,f)
loop = 1 -- x = -0.302752293578 -- f(x) = 0.577754629433
loop = 2 -- x = 0.0171829183112 -- f(x) = -0.0343607633197
loop = 3 -- x = -0.000136369523294 -- f(x) = 0.000272739044053
loop = 4 -- x = 6.82459009549e-07 -- f(x) = -1.3649180191e-06
loop = 5 -- x = -3.42951282581e-09 -- f(x) = 6.85902565163e-09
>>> newton(4,f)
loop = 1 -- x = 2.81381063334 -- f(x) = 16.6508096258
loop = 2 -- x = 2.07726860993 -- f(x) = 4.80897005523
loop = 3 -- x = 1.66192579318 -- f(x) = 1.26638303704
loop = 4 -- x = 1.47554415592 -- f(x) = 0.261511511278
loop = 5 -- x = 1.42307768519 -- f(x) = 0.0357905433661
loop = 6 -- x = 1.41514604066 -- f(x) = 0.0037336030139
loop = 7 -- x = 1.4143056045 -- f(x) = 0.000368204435166
loop = 8 -- x = 1.41422258344 -- f(x) = 3.60845992158e-05
>>> newton(-2,f)
loop = 1 -- x = -1.57492029756 -- f(x) = -0.756550674276
loop = 2 -- x = -1.4229611678 -- f(x) = -0.0353157404884
loop = 3 -- x = -1.4133056401 -- f(x) = 0.00362819253248
loop = 4 -- x = -1.41431958126 -- f(x) = -0.000424123216164
loop = 5 -- x = -1.41420132921 -- f(x) = 4.89320029273e-05
loop = 6 -- x = -1.41421497589 -- f(x) = -5.65406673259e-06
>>> 

By a suitable choice of starting values, we have found all three roots of the polynomial, as indicated by the near-zero values of the functions in each case at the end of the iterations.

2.5.2  The for loop

Here is an alternate Newton’s method solver which uses Python’s for loop rather than a while loop:

# newton2.py -- A newton's method rootfinder.

def newton(xstart, fn):
    x = xstart
    eps = 0.0001
    xold = x + 10*eps
    delta = 0.1
    maxloops = 10

    for loops in range(maxloops):
        fval = fn(x)
        dfdx = (fn(x + delta) - fn(x))/delta
        xold = x
        x = xold - fval/dfdx
        print ("loop =", loops, "-- x =", x, "-- f(x) =", fn(x))
        if (abs(x - xold) < eps) and (abs(fval) < eps):
            break
    else:
        print ("no solution!")

This solver illustrates several new Python features. First, the for statement executes the associated clause for each value that the loop variableloops in this case – takes on. The variable after in is formally known as a sequence. A sequence is nothing more than an ordered collection of things; loops takes on the value of each element of this sequence in turn as the looping proceeds.

The easiest way to write a sequence is via a list – an example would be [0, 1, 2, 3]. The elements of the list don’t need to be integers; they can be any legal Python variables or constants, including other lists. They don’t even have to all be of the same type. Alternate sequences are strings: 'abcdefg'; or a tuple: (0, 1, 2, 3). A tuple is just like a list, but it cannot be changed in the way that a list can – i. e., it is immutable – think of it as a list constant. A string is just an immutable sequence of characters.

The sequence in the above example is the function range, which generates a list of integers. For example,

range(3) -> [0, 1, 2],

while

range(1,3) -> [1, 2],

and

range(1,6,2) -> [1, 3, 5].

Thus, our for statement sets loops successively to elements of the list

range(maxloops) -> [0, 1, 2, ..., maxloops - 1]

In Python 3, unlike in Python 2, the range function does not print out a list as indicated above. Instead, it is much more like the xrange function in Python 2, which generates sequencing on the fly. Xrange itself doesn’t exist in Python 3.

The test for convergence of the iteration in the above example is more sophisticated than in the previous case; not only does x - xold have to be sufficiently small, the function evaluated at x must be close to zero as well. When this occurs, the loop terminates early courtesy of the break statement. The else clause is only executed if the for loop terminates without executing a break statement. Here we use it to alert the user that convergence was not attained. The break and else statements can be used analogously with the while loop also. The else is optional in both cases.

So, how does our new Newton’s method solver work? Below it is applied to two functions, f(x) = x2 − 4 and f(x) = x2 + 4. The former has real roots, which newton can find. However the roots of the latter are complex, and hence are not found. (Newton’s method can be made to work for complex analytic functions as well. Try invoking newton with an initial x = (1 + 3j)!)

>>> from newton2 import *
>>> def func(x):
...   return x**2 - 4.
... 
>>> newton(3,func)
loop = 0 -- x = 2.18032786885 -- f(x) = 0.753829615695
loop = 1 -- x = 2.01133262241 -- f(x) = 0.045458917965
loop = 2 -- x = 2.0003060376 -- f(x) = 0.00122424405357
loop = 3 -- x = 2.00000748606 -- f(x) = 2.99442871095e-05
loop = 4 -- x = 2.0000001826 -- f(x) = 7.30399140281e-07
>>> 
>>> def func2(x):
...   return x**2 + 4.
... 
>>> newton(3,func2)
loop = 0 -- x = 0.868852459016 -- f(x) = 4.75490459554
loop = 1 -- x = -1.7185621737 -- f(x) = 6.95345594488
loop = 2 -- x = 0.365104846463 -- f(x) = 4.13330154891
loop = 3 -- x = -4.61351872796 -- f(x) = 25.2845550532
loop = 4 -- x = -1.84322714371 -- f(x) = 7.39748630332
loop = 5 -- x = 0.21939117245 -- f(x) = 4.04813248655
loop = 6 -- x = -7.29409275083 -- f(x) = 57.2037890577
loop = 7 -- x = -3.34578679829 -- f(x) = 15.1942892996
loop = 8 -- x = -1.0406787574 -- f(x) = 5.0830122761
loop = 9 -- x = 1.52474027382 -- f(x) = 6.3248329026
no solution!
>>> 

2.5.3  Leaving loops early

Two statements change execution flow in Python loops, the break and continue statements:

>>> for i in range(5):
...   if i == 3:
...     break
...   print (i)
... 
0
1
2
>>> for i in range(5):
...   if i == 3:
...     continue
...   print (i)
... 
0
1
2
4
>>> 

Note that the break statement throws the program out of the innermost loop when it is encountered, as illustrated in the previous sub-section. The continue statement interrupts the current iteration of the loop and goes on to the next. The pass statement does nothing at all! The purpose of this last statement is solely as a placeholder.

2.6  Fun with sequences and dictionaries

Many of Python’s normal applications involve operations on text. Such text operations are typically not used much in numerical programming, so we will only touch lightly on this subject. The online documentation for Python is particularly complete and extensive for use of strings, lists, tuples, and dictionaries in text operations, so further information is readily available. Nevertheless, sometimes these language elements come in handy in scientific work, so we will spend some time on them.

First some definitions:

2.6.1  Indexing and slicing of sequences

Indexing, which like the C language, starts at zero, allows one to extract individual elements or subsets of sequences. Demonstrating this on strings is easier than explaining it:

>>> x = 'abcdefg'
>>> x[0]
'a'
>>> x[1]
'b'
>>> x[-1]
'g'
>>> x[-2]
'f'
>>> x[0:2]
'ab'
>>> x[:]
'abcdefg'
>>> x[0:]
'abcdefg'
>>> x[:-1]
'abcdef'
>>> x[0:6:2]
'ace'
>>> 

The colon-based notation returns a slice of the initial sequence, or a sequence of elements starting with that indexed by the number before the colon and ending with the that indexed by the number after the colon minus one. The negative index notation is a clever way of indexing back from the end of the sequence. The optional third index is the stride – for instance, the 2 tells Python to jump 2 elements per step in marching through the sequence rather than the default 1.

One slight difference between indexing of lists and strings is that a simple string index (i. e., not a slice) returns a string consisting of a single character, whereas a simple index of a list returns the corresponding element of the list, not a list consisting of a single element. However, a slice referencing a single list element returns a list consisting of that single element:

>>> y = [0, 1, 2, 3]
>>> y[0:2]
[0, 1]
>>> y[0]
0
>>> y[0:1]
[0]
>>> 

Tuples work just like lists except that the notation for a tuple consisting of just one element is (for example) (5,), not (5). The latter is just an integer enclosed in parentheses.

2.6.2  String methods

Python is an object oriented language, so data elements contain not only data, but useful methods which perform common methods of data manipulation (see the next section). For instance:

>>> x = 'I found a dog.'
>>> x.find('dog')
10
>>> x.find('frog')
-1
>>> x.split(' ')
['I', 'found', 'a', 'dog.']
>>> x.partition('found')
('I ', 'found', ' a dog.')
>>> x.replace('dog','cat')
'I found a cat.'
>>> x = 'I+found+a+dog'
>>> x.split('+')
['I', 'found', 'a', 'dog']
>>> x = 'dog     found'
>>> x.split()
['dog', 'found']
>>> x.split(' ')
['dog', '', '', '', '', 'found']
>>> 

The split method called with no arguments splits strings into elements separated by any amount of white space, e. g., spaces, tabs, newlines, etc.

There are many more methods for strings; these are just some of the most useful.

2.6.3  List methods

Python lists have some methods specific to them. Some of the most useful are illustrated below.

>>> x = [1, 2, 3, 4, 5]
>>> x.append(6)
>>> x
[1, 2, 3, 4, 5, 6]
>>> x.insert(2,2.5)
>>> x
[1, 2, 2.5, 3, 4, 5, 6]
>>> x.remove(2.5)
>>> x
[1, 2, 3, 4, 5, 6]
>>> y = x.pop()
>>> y
6
>>> x
[1, 2, 3, 4, 5]
>>> x.reverse()
>>> x
[5, 4, 3, 2, 1]
>>> x.sort()
>>> x
[1, 2, 3, 4, 5]
>>> 

2.6.4  Sequence math

The addition + and multiplication * operators have straightforward meanings with sequences:

>>> 'I found ' + 'a dog.'
'I found a dog.'
>>> 'dog '*3
'dog dog dog '
>>> [1, 2, 3] + [4, 5]
[1, 2, 3, 4, 5]
>>> 2*[1, 2, 3]
[1, 2, 3, 1, 2, 3]
>>> 

The % operator is more complex, but quite useful for incorporating numbers and other strings into strings in a specified format:

>>> a = 'I found %d dogs.' % (3)
>>> a
'I found 3 dogs.'
>>> 'The %s weighed %f grams.' % ('mouse', 43.5)
'The mouse weighed 43.500000 grams.'
>>> 'The %s weighed %e grams.' % ('mouse', 43.5)
'The mouse weighed 4.350000e+01 grams.'
>>> 'The %s weighed %.1f grams.' % ('mouse', 43.5)
'The mouse weighed 43.5 grams.'
>>> 'The speed of light is %.2e m/s.' % (3e8)
'The speed of light is 3.00e+08 m/s.'
>>> 

The string contains C-like format statements, each preceeded by a percent sign %. A percent sign % also separates this string from a tuple, the elements of which can be strings or numbers, but must match the corresponding types in the initial string. The first example above reminds us that the results of a string operation can be assigned to a variable, which can then be printed (implicitly or explicitly).

Finally, it is easy to find the length of sequences with the function len():

>>> z = 'abcde'
>>> len(z)
5
>>> x = [0, 2, 4, 6]
>>> len(x)
4
>>> 

Likewise, it is easy to determine whether a particular element exists in a sequence using the in operator:

>>> x = [1,2,3,4,5,6]
>>> 1 in x
True
>>> 1.5 in x
False
>>> 'a' in 'abc'
True
>>> 'x' in 'abc'
False
>>> 

Logical expressions involving the in operator can be used in if and while statements to control the flow of the program.

2.6.5  Dictionaries

The dictionary is a Python type consisting of a collection of key:value pairs. Its main use is to construct a kind of data base. For instance, the dictionary called phones below contains peoples’ names (the keys) associated with their phone numbers (the values). Dictionaries are displayed with surrounding curly braces and the key:value pairs are separated by commas.

>>> phones = {'george': '835-4427', 'sandy': '838-1192'}
>>> phones['george']
'835-4427'
>>> phones['harry'] = '892-9553'
>>> phones['harry']
'892-9553'
>>> phones
{'george': '835-4427', 'sandy': '838-1192', 'harry': '892-9553'}
>>> phones.keys()
['george', 'sandy', 'harry']
>>> del phones['sandy']
>>> phones
{'george': '835-4427', 'harry': '892-9553'}
>>> 'harry' in phones
True
>>> 

As shown, indexing a dictionary by a key returns the value as long as the specified key exists in the dictionary. New entries in the dictionary may be made by assigning the value of the new key:value pair to the dictionary indexed by the new key. The del command removes the indexed key, as shown above. The key method for dictionaries illustrated above returns a list of the keys in the dictionary. Finally, the in operator can be used to see if a dictionary has a particular key.

Keys in dictionaries must be immutable, and are most commonly strings. Keys cannot appear more than once. The value can be any python data type, and different keys can have values of different types associated with them.

Dictionaries are not sequences, as the key:element combinations are not guaranteed to occur in any particular order. Thus, dictionaries cannot be used directly for iteration in for loops. However, it is possible to loop over the elements of a dictionary, returning each key and value successively, using the dictionary’s iteritems method, which effectively converts the dictionary into a list of the dictionary’s entries. Using the phones dictionary defined above:

>>> for k, v in phones.iteritems():
...   print (k, v)
... 
george 835-4427
harry 892-9553
>>> 

2.6.6  Assignments and aliases

Assignments involving lists and dictionaries can have some possibly unexpected results. Consider the following:

>>> a = ["a", "b", "c"]
>>> b = a
>>> c = a[:]
>>> a
['a', 'b', 'c']
>>> b
['a', 'b', 'c']
>>> c
['a', 'b', 'c']
>>> a.append("d")
>>> a
['a', 'b', 'c', 'd']
>>> b
['a', 'b', 'c', 'd']
>>> c
['a', 'b', 'c']
>>> 

The assignment b = a just makes the variable b an alias of the variable a, so that when a is changed by the append method, b changes correspondingly. However, the assignment c = a[:] constructs a new list c by extracting the contents of list a element by element. Thus, when a changes, c does not. Any assignment other than a simple “a = b” causes this data transfer to happen, creating a new object instead of an alias to the old object, even if the net effect is to make the new object identical to the old object, as occurs with c = a[:].

2.7  Object oriented programming in Python

You may never write an object oriented program in Python as a scientific programmer. Nevertheless, it is important to have a basic understanding of how object oriented programming works in this language, since the construct is used so frequently.

2.7.1  Defining classes

The basic notion in object oriented programming is the class. Think of a class as a blueprint of a data structure and methods for operating on this structure. For instance, consider the following module named workstuff.py which contains the class definition worker_pay:

class worker_pay:
    version = "1.2.4"
    
    def __init__(self):
        self.data = {}

    def add_worker(self,name,pay):
        self.data[name] = pay

    def list_worker(self,name):
        return self.data[name]

    def list_all(self):
        return self.data.keys()

The defs inside the class definition define the class methods, which are really just functions.

2.7.2  Creating and working with class instances

Consider now the sequence of statements shown below:

>>> import workstuff
>>> x = workstuff.worker_pay()
>>> x.add_worker("george", 3000)
>>> x.add_worker("frank", 2500)
>>> x.list_all()
['frank', 'george']
>>> x.list_worker("george")
3000
>>> a = x.list_worker("george")
>>> a
3000
>>> x.data
{'frank': 2500, 'george': 3000}
>>> 

The first statement simply imports the module containing the class definition. The next statement x = workstuff.worker_pay() creates an instance x of the class worker_pay. This instance contains information which is manipulated by the four methods defined in the class. The first method, __init__(self) is a specially named method which is invoked automatically when an instance of the class is created. In this case it assigns an empty dictionary to the variable data contained in the instance, though in fact it could do any number of things. The other methods respectively add data (worker names and their salaries) to the dictionary, return particular worker’s salaries, and return the names of all workers. The returned values can be assigned to other variables.

Notice that the first argument in the definitions of the methods, self, is omitted in the method invocations. Self in the class definition lets Python know that the method for the object x should be applied to the class instance x itself. There is nothing magic about the name “self”; it is just convention to use this terminology and it could be any other name defined as the first argument of a method and used consistently in the method body to indicate the invoked object.

The variable data, which is defined in the invocation of the instance x, can be accessed from the outside via the statement x.data. Normally it is considered bad form to take advantage of this in object oriented programming; all access to the data in an object should normally be indirect via methods. Python depends on the “honor system” to enforce this programming code of conduct!

One other type of information in a class may be accessed; variables defined inside the class definition but outside of any method. Note the assignment statement version = '1.2.4' in our class. The variable is called an attribute of the class. Its value may be obtained using notation similar to that of a method invocation, but without the (possibly empty) argument list, as illustrated below:

>>> x = workstuff.worker_pay()
>>> x.version
'1.2.4'
>>> workstuff.worker_pay.version
'1.2.4'
>>> 

Attributes are used represent constant values specified in the class, in this case the version of the class. The attribute may be obtained either from the instance or from the class definition itself, as the second example illustrates above.

Multiple instances of our class can be invoked:

>>> y = workstuff.worker_pay()
>>> y.add_worker("mary", 5000)
>>> x.list_all()
['frank', 'george']
>>> y.list_all()
['mary']
>>> 

The instance y shares methods with x, but the data are completely independent of each other. Thus, one could use this single class to keep track of employees and salaries in different departments by invoking multiple instances.

2.7.3  Class inheritance

Classes in Python can inherit methods from other classes. For instance, a class named matrix may be a two-dimensional subclass of a more general class called array. Assuming array has been previously defined, matrix would inherit all the methods defined in array. The sub-class matrix could then define additional methods specific to this class, e. g., matrix multiplication or computation of eigenvalues, in addition to taking advantage of methods defined in the super-class. Methods in the super-class could even be replaced by method definitions in the sub-class with the same name if appropriate. The way inheritance is implimented is to define the sub-class with a reference to the super-class as follows:

>>> def subclass(superclass):
>>>   ...

The super-class may itself be a sub-class of a super-super-class, etc.

Nothing in object-oriented programming cannot be done with, say, functions. However, it does have the advantage of keeping structured data and the methods used to operate on the data together. You can choose whether to use it or not in Python.

2.8  More input and output

We learned a bit about the readline() and print statements earlier. We now expand a bit on this.

2.8.1  General file handling

In order to read or write from a file, we first need to open it. When we are finished using it, we need to close the file, especially if we have written to it. To open a file for writing, we use the open command as follow:

>>> f = open("testfile.txt", "w")
>>> f.write("This is a test string\n")
>>> f.close()
>>> 

The first argument of the open command is the name of the file to which we are writing and the second string, ''w'' tells Python to create a new, empty file with the specified file name, which is ready to be written to. The open command returns a file object which has several methods associated with it. The method write() writes the variable enclosed in the parentheses to the to the file in question. We can invoke the write methods as many times as we want, before closing the file with the close() method. The file name string can actually contain a full path such as, for instance, ''/home/raymond/testfile.txt''.

Note that the special character represented by the \n is a “newline” character, actually an ASCII linefeed. This character is used to terminate lines in Unix and Linux systems. Windows and Macintosh systems each use different characters for this, which is one of the reasons why interchanging text between these systems and Unix/Linux can be so annoying.

Let’s see how we did; we attempt to read the file that we wrote:

>>> g = open("testfile.txt", "r")
>>> x = g.read()
>>> x
'This is a test string\n'
>>> g.close()
>>>

Success! The file we wrote was where we intended to put it (the current working directory) and had the desired content, a fact that can be verified independently with a text editor. In this case we opened the existing file for read-only access with the ''r'' argument in the open statement. Unlike the readline() method we discussed earlier, which reads a single line of text, the read() with no argument reads the entire file, transferring it to the variable x in the above script. What does a numerical argument do? Try the above code with x = g.read(7) and find out!

The above read() and write() methods can actually read and write data in arbitrary binary form, not just text strings. However, beware – on Macs and Windows machines, some extra fiddling has to be done to be sure binary data are not corrupted.

2.8.2  Standard input, output, and error

The sys module pre-defines three file objects which are always open and need not be closed, stdin, stdout, and stderr. These correspond to the standard Unix/Linux input-output streams of the same name; stdin reads typed input from the keyboard and both stdout and stderr write output to the terminal. The latter is by custom reserved for error messages which often need to be kept separate from the main stdout data stream. These data streams can be redirected to access files in the usual Unix/Linux fashion.

Here are some examples of the use of these data streams:

>>> import sys
>>> sys.stderr.write("ugh!")
ugh!>>> 
>>> sys.stdout.write("gerbils\n")
gerbils
>>> x = sys.stdin.readline()
This is a line to standard input.
>>> x
'This is a line to standard input.\n'
>>> 

Note the role of the “\n” in these examples.

2.9  Further information

The Python home page http://www.python.org/ is an excellent source of documentation for standard Python and Python libraries. Look particularly at the Tutorial http://docs.python.org/tutorial/ and the Library Reference http://docs.python.org/library/.

Remember that this document refers to Python 3.


Previous Up Next