We touch first on those aspects of “standard” Python (including “standard” modules) which we need to do scientific work. Standard Python is a clean but powerful language and is relatively easy to learn.
As noted above, Python is an interpreted language, which means that
you can type code directly into it and get the result
immediately. This makes Python an excellent desk calculator. Start the
Python interpreter at your prompt by typing python
:
swallow$ python Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>
The prompt
>>>
tells you that python is ready for input. Type in any
numeric expression and you should get an answer. For instance
>>> 3*(2 + 5.5) 22.5 >>>
The number 22.5
is the result of the expression. To finish with
Python, type ^D
or quit()
. The ^D
means hold down the
control key and type d
.
>>> quit() swallow$
Here is one quirk of Python. Let’s try division:
>>> 5/3 1 >>> 5%3 2 >>> 4./3. 1.3333333333333333 >>>
Notice that the ratio of integers yields an integer, not the (possibly
expected) floating point result. To do floating point division you
need at least one of the numbers to be floating point to begin with.
The %
operator provides the remainder in integer division.
Let’s try some trig:
>>> sin(0.01) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'sin' is not defined >>>
Oops! Something went wrong. To use trigonometric or other common math functions, you need to load the math module:
>>> from math import * >>> sin(0.01) 0.0099998333341666645 >>>
The *
loads all the functions defined in the math module
into the interpreter. Now your common math functions work:
>>> exp(3.) 20.085536923187668 >>> log(3.) 1.0986122886681098 >>> log10(3.) 0.47712125471966244 >>> abs(-2.) 2.0 >>> atan(1.) 0.78539816339744828 >>> atan2(-3.,-4.) -2.4980915447965089 >>>
If the math
module were loaded with the import math
statement rather than from math import *
, the math functions
would all need the prefix math.
to work:
>>> import math >>> log(3.) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'log' is not defined >>> math.log(3.) 1.0986122886681098 >>>
Note that angles are in radians, as in compiled languages such as C and Fortran. Also, by default, floating point numbers (i. e., numbers with decimal points) are “double precision” or 64 bits. Integers depend on the architecture of the machine being used, but are at least 32 bits. Floats and ints can be converted explicitly from one to the other:
>>> float(3) 3.0 >>> int(3.1) 3 >>> int(-3.1) -3 >>>
Conversion to ints rounds toward zero for both positive and negative floats on this computer, but this is not guaranteed since it depends on computer hardware.
Sometimes one wants to use a certain number repeatedly. In this case it is possible to assign it to a variable. So, instead of typing
>>> 2*23.5 47.0 >>> 3*23.5 70.5 >>> 4*23.5 94.0 >>>
do instead
>>> const = 23.5 >>> 2*const 47.0 >>>
etc.
By the way, the last value returned by the interpreter may be
retrieved, as it is stored in the variable _
:
>>> "frog" 'frog' >>> _ 'frog' >>>
Sometimes one would like to package a set of commands to Python so
that repeated typing of these commands is not needed. This can be
done by creating a Python script. Using your favorite text
editor, create a file called trivial.py
with the following
content:
# trivial1.py -- This is a trivial script. a = 3. result = a**3 + 2.*a**2 - 5. print (a, result)
Then type python trivial1.py
at the command line to execute
the statements in this script, just as if you typed them directly
into Python:
swallow$ python trivial1.py 3.0 40.0
The print
statement prints out the values of the variables
a
and result
. The operator **
raises the variable
before it to the power following it. Note everything following a
#
is treated as a comment. Comments are there for humans
to read – Python ignores them.
You could make the output more informative by adding some descriptive information to the print statement:
print ('for a =', a, ' a**3 +2a**2 - 5 =', result)
yields
for a = 3.0 a**3 +2a**2 - 5 = 40.0
The modified print statement introduces a new data type in Python; any text enclosed between (single or double) quotes is called a string. More on this later.
Actually, this script is pretty useless, since to get a different
result, one would have to edit the script to change the value of
a
. However, a slight modification causes the script to input
a value of a
from the command line each time it is run:
# trivial2.py -- This is a trivial script. import sys print ('type a: ',) text = sys.stdin.readline() a = float(text) result = a**3 + 2.*a**2 - 5. print ('for a =', a, ' a**3 +2a**2 - 5 =', result)
Running this results in
swallow$ python trivial2.py type a: 4 for a = 4.0 a**3 +2a**2 - 5 = 91.0 swallow$
where the 4
is typed in response to the prompt type a:
and stored as a string in the variable text
. The call
to the function float()
converts this string to a float variable.
The readline command returns an empty string (i. e., ''
) when
the end of the file is reached.
Information is read by the system on the standard input, which
necessitates importing the module sys
to enable this
capability. Notice that we have used the import form import sys
rather than the form we used earlier for the math module, which would
have been from sys import *
. Which form is used depends on
whether one wants to include the name of the module as a prefix to
calls to module functions or not, e. g., sys.stdin
versus
stdin
. The advantage of the latter is less typing. The
advantage of the former is that a possible pre-existing function
stdin
doesn’t get clobbered.
The stdin
method of sys
has many sub-methods, of
which readline()
is just one. Each call reads an entire line
of typing. The sub-method readlines()
grabs all lines in
the form of a list of lines:
linelist = sys.stdin.readlines() nlines = len(linelist) print (linelist[0]) print (linelist[1]) ... print (linelist[nlines - 1]) print (linelist[-1])
The list is a compound Python data type. The list can hold a bunch of
individual elements consisting of constants or variables, including
other lists. The number of elements is returned by the function
len()
, as shown above. The elements of a list can be accessed
by the square bracket notation illustrated above. Indexing starts at
zero and the last element of the list has an index equal to the number
of elements minus one. The last element can also be indexed by
-1
, the second to the last by -2
, etc. This indexing
convention holds for compound objects throughout Python.
If you get tired of typing
swallow$ python trivial2.py
you can add the line
#!/usr/bin/python
to the beginning of the script and then make the script executable by typing
swallow$ chmod +x trivial.py
at the system prompt. Then the script can be run by simply typing
swallow$ trivial2.py
Change /usr/bin/python
to whatever the full path is for python
on your system.
An advantage of an executable Python script is that it presents an alternate way of entering input data into the script. For example, we can rewrite our trivial script as follows:
#!/usr/bin/python # trivial3.py -- This is a trivial script. import sys print (sys.argv) # Included so we can see what is going on. a = float(sys.argv[1]) result = a**3 + 2.*a**2 - 5. print ('for a =', a, ' a**3 +2a**2 - 5 =', result)
The variable sys.argv
from the sys
package is a list
of the words typed on the command line, starting with the command
itself. Several examples follow:
swallow$ trivial3.py ['./trivial3.py'] Traceback (most recent call last): File "./trivial3.py", line 5, in <module> a = float(sys.argv[1]) IndexError: list index out of range
In this case just the command is typed, and the returned list just
contains the command as its first and only element. Indexing in
Python starts from zero, so there is no sys.argv[1]
, a fact
noted and commented upon by the Python interpreter. Let us try
instead
swallow$ trivial3.py 3 ['./trivial3.py', '3'] for a = 3.0 a**3 +2a**2 - 5 = 40.0
The script converts the string '3'
into the corresponding
floating point number and proceeds as before. If we type
swallow$ trivial3.py "frogmorton" ['./trivial3.py', 'frogmorton'] Traceback (most recent call last): File "./trivial3.py", line 5, in <module> a = float(sys.argv[1]) ValueError: invalid literal for float(): frogmorton
the float conversion becomes problematic and Python again reports its distress. However, if we type
swallow$ trivial3.py 3 "frogmorton" ['./trivial3.py', '3', 'frogmorton'] for a = 3.0 a**3 +2a**2 - 5 = 40.0
Python is happy even though there is an extra command line argument which remains unused. Note that the print statement in this script is not really needed; it is there only to make it clear how the command line arguments are transmitted to the script.
Various connections with the underlying operating system are available through the os module. The most interesting of these functions is the ability to run arbitrary commands in a sub-shell. For example:
>>> import os >>> os.system("date") Sat Oct 3 17:22:19 MDT 2009 0 >>>
The output of the command is printed and the exit status of the system call is returned.
In this section we begin to consider Python statements which control branching, looping and modularization of code. An important point to remember here is that Python uses indentation of code to define groups of related statements, or clauses. Less indentation ends the clause. Furthermore, the statements which introduce clauses always end with a colon. Examples will clarify this as we go along. It is better to use tabs rather than spaces for indentation as it is easier to make the indentation consistent.
A function is a clause which can be called by the user or by other functions to execute a particular task. Functions can be defined interactively, but they are most useful when placed in special files called modules, which can be imported by the interpreter. Let’s take as an example a function which solves the quadratic equation
ax2+bx+c=0. (2.1) |
We all know that this equation has two solutions
x= |
| . (2.2) |
Consider the following script, named quadratic1.py
:
# quadratic1.py -- Quadratic equation solver. def qsolns(a,b,c): temp = b**2 - 4.*a*c x1 = (-b + temp**0.5)/(2.*a) x2 = (-b - temp**0.5)/(2.*a) return [x1, x2]
This script contains the definition of a single function qsolns
.
The three variables a
, b
, and c
are transmitted
to the body of the function, which computes the two solutions x1
and x2
. The two solutions are then combined into a list
[x1, x2]
and returned to the caller. To use this script,
which is called a module, We use it from within Python as follows:
>>> import quadratic1 >>> quadratic1.qsolns(1.,4.,0.) [0.0, -4.0] >>>
Direct substitution verifies 0.0
and -4.0
as solutions
to the original equation.
There is another way to call the function qsolns:
>>> quadratic1.qsolns(a = 1., b = 4., c = 0.) [0.0, -4.0] >>>
This makes the relationship between the function arguments and the values assigned to them more obvious. However, it also is more verbose. The first instance uses positional parameters the second uses keyword parameters. The two methods can be mixed as long as keyword parameters follow positional parameters:
>>> quadratic1.qsolns(1., b = 4., c = 0.) [0.0, -4.0] >>> quadratic1.qsolns(a = 1., b = 4., 0.) File "<stdin>", line 1 SyntaxError: non-keyword arg after keyword arg >>> quadratic1.qsolns(c = 0., a = 1., b = 4.) [0.0, -4.0] >>>
As the last case shows, keyword parameters can appear in any order as long as all positional parameters appear first.
So far, so good. However, suppose we try the following:
>>> quadratic1.qsolns(1,4,6) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "quadratic1.py", line 4, in qsolns x1 = (-b + temp**0.5)/(2.*a) ValueError: negative number cannot be raised to a fractional power >>>
Clearly the problem is that for the choice of (a, b, c)
the
solutions to the quadratic equation are complex rather than real.
One solution to this problem is to make the variable temp
complex:
# quadratic3.py -- Quadratic equation solver. def qsolns(a,b,c): temp = complex(b**2 - 4.*a*c, 0.) x1 = (-b + temp**0.5)/(2.*a) x2 = (-b - temp**0.5)/(2.*a) return [x1, x2]
Running this yields
>>> import quadratic3 >>> solns = quadratic3.qsolns(1.,4.,6.) >>> print (solns) [(-2+1.4142135623730951j), (-2-1.4142135623730951j)] >>>
A complex number with a negative real part can be raised to a
fractional power, the result of which is also complex. The occurence
of one complex variable in an expression makes all calculations
complex, including the answers. Note the use of the
complex(real,imag)
function to construct a complex number or
variable. If c
is complex, the real and imaginary parts are
given by c.real
and c.imag
. The complex conjugate of
c
is given by c.conjugate()
. Finally, the absolute
value is computed by abs(c)
. For complex trig, log, and
exponential functions, load the cmath
module, which provides
complex versions of most of the functions of the math
module.
Several comments about functions are worth making. Variables defined inside a function are generally invisible outside the function. However, variables outside the function (but in the same script or module) can be seen inside the function, but not assigned to, with one exception; if a variable is defined as global, it can be read and assigned to in both locations. Globals are generally discouraged, so we do not discuss them further here.
The values of the variables passed from the calling environment are
available inside the function via the list in parentheses in the
function definition line. Thus, the function qsolns
has access
to the values of a
, b
, and c
.
The results of a function calculation can be accessed by the calling
environment via the return
statement in the function. In the
case of qsolns
, the list containing the roots of the quadratic
equation is returned to the calling environment, as illustrated above.
This highly channeled flow of information to and from a function may
sound restrictive, but it is desirable from the point of view of
keeping track of what is going on in a program.
Scripts can call functions defined within the script, and functions themselves can call other functions. However, for this to work, the definition of the function must occur before the call to the function – otherwise, Python doesn’t know that the function exists.
An alternate approach to handling the quadratic equation problem is to test beforehand for various possibilities that might cause difficulties with the calculation using the if statment. Let us rewrite our function as follows:
# quadratic2.py -- Quadratic equation solver. def qsolns(a,b,c): temp = b**2 - 4.*a*c if a == 0: #a = 0 is a special case if b != 0: return [-float(c) / b] return [] elif temp < 0: #solutions are complex return [] elif temp == 0: #only one solution return [(-b + temp**0.5)/(2.*a)] else: x1 = (-b + temp**0.5)/(2.*a) x2 = (-b - temp**0.5)/(2.*a) return [x1, x2]
If a == 0
is true, i. e., if the variable a
is equal to
zero, then the clause of the if
statement is executed.
Alternatively, if this is false, but the elif
condition is
true, the clause of this statement is executed. If neither is true,
the clause of the else
statement is executed. This function
illustrates the full range of possibilities of the if
statement. For each if
there are zero or more elif
clauses. The else
clause considers all other possibilities.
Try this function to see what happens!
Comparison operators between numbers are >
, >=
,
==
, !=
, <=
, and <
, the meanings of which
are obvious except possibly ==
and !=
; the first yields
true if the two numbers are equal and false otherwise, whereas the
second yields the reverse. Don’t confuse ==
with =
which tries to assign the right value to the left value, and is an
error in this context! Note that if a single number or numerical
expression is used in place of the boolean expression, it is treated
as false if it evaluates to zero and true if it is nonzero. Logical
combinations of boolean expressions are produced with the and
and or operators, and
and or
. Thus,
(4 > 3) or (3 < 2)
is true, while (4 > 3) and (3 < 2)
is
false. The parentheses are not needed in this context but are helpful
to avoid ambiguity.
Suppose we want to find the roots of some function f(x). Newton’s method is perhaps the simplest (and least reliable!) means of doing this. If x0 is a first guess at the root, then a Taylor expansion of f(x) about this point can be written to first order as
f(x) ≈ f(x0 ) + |
| (x − x0 ) . (2.3) |
Setting this to zero to find the root of the linear approximation to f(x) results in the equation
x = x0 − |
| . (2.4) |
However, this is only an approximation, so this equation has to be recalculated a number of times, replacing x0 by the new x each time until convergence occurs. Here is a simple Python function which does this:
# newton1.py -- A newton's method rootfinder. def newton(xstart, fn): x = xstart eps = 0.0001 xold = x + 10*eps delta = 0.1 loops = 0 maxloops = 100 while (abs(x - xold) > eps) and (loops < maxloops): loops = loops + 1 fval = fn(x) dfdx = (fn(x + delta) - fn(x))/delta xold = x x = xold - fval/dfdx print ("loop =", loops, "-- x =", x, "-- f(x) =", fn(x))
This function uses the while
statement – the clause of the
statement is executed while the boolean expression following
while
is true. In this expression x
is the current
value of the variable x whereas xold
is the previous value,
x0. As long as the absolute value of the difference between these
two is greater than some small number eps
, this part of the
expression is true. Convergence is reached when x
stops
changing, which means that abs(x - xold)
becomes really small.
The other part simply limits the number of iterations of the loop to
100 in order to guard against a runaway loop – which is all too
common with Newton’s method! (Recall that and
indicates a
logical and.)
The function definition has two arguments, the initial value or first
guess for x and the name of the function for which we wish to find
the roots. The latter must be defined separately. Notice that the
initial value of xold
is set far enough away from x
so
that the while
statement executes at least once.
Let’s try out our new root finder! We define a cubic polynomial
function f(x)
directly in the interpreter (notice the
indentation needed after the def
statement and the blank line
at the end of the function) and then call newton
with a variety
of starting values. (Also notice that the name of the function in its
definition need not be the same as the name of the function within
newton
. This is a general property of function arguments.)
>>> from newton1 import * >>> def f(x): ... return x**3 - 2.*x ... >>> newton(0.5,f) loop = 1 -- x = -0.302752293578 -- f(x) = 0.577754629433 loop = 2 -- x = 0.0171829183112 -- f(x) = -0.0343607633197 loop = 3 -- x = -0.000136369523294 -- f(x) = 0.000272739044053 loop = 4 -- x = 6.82459009549e-07 -- f(x) = -1.3649180191e-06 loop = 5 -- x = -3.42951282581e-09 -- f(x) = 6.85902565163e-09 >>> newton(4,f) loop = 1 -- x = 2.81381063334 -- f(x) = 16.6508096258 loop = 2 -- x = 2.07726860993 -- f(x) = 4.80897005523 loop = 3 -- x = 1.66192579318 -- f(x) = 1.26638303704 loop = 4 -- x = 1.47554415592 -- f(x) = 0.261511511278 loop = 5 -- x = 1.42307768519 -- f(x) = 0.0357905433661 loop = 6 -- x = 1.41514604066 -- f(x) = 0.0037336030139 loop = 7 -- x = 1.4143056045 -- f(x) = 0.000368204435166 loop = 8 -- x = 1.41422258344 -- f(x) = 3.60845992158e-05 >>> newton(-2,f) loop = 1 -- x = -1.57492029756 -- f(x) = -0.756550674276 loop = 2 -- x = -1.4229611678 -- f(x) = -0.0353157404884 loop = 3 -- x = -1.4133056401 -- f(x) = 0.00362819253248 loop = 4 -- x = -1.41431958126 -- f(x) = -0.000424123216164 loop = 5 -- x = -1.41420132921 -- f(x) = 4.89320029273e-05 loop = 6 -- x = -1.41421497589 -- f(x) = -5.65406673259e-06 >>>
By a suitable choice of starting values, we have found all three roots of the polynomial, as indicated by the near-zero values of the functions in each case at the end of the iterations.
Here is an alternate Newton’s method solver which uses Python’s
for
loop rather than a while
loop:
# newton2.py -- A newton's method rootfinder. def newton(xstart, fn): x = xstart eps = 0.0001 xold = x + 10*eps delta = 0.1 maxloops = 10 for loops in range(maxloops): fval = fn(x) dfdx = (fn(x + delta) - fn(x))/delta xold = x x = xold - fval/dfdx print ("loop =", loops, "-- x =", x, "-- f(x) =", fn(x)) if (abs(x - xold) < eps) and (abs(fval) < eps): break else: print ("no solution!")
This solver illustrates several new Python features. First, the
for
statement executes the associated clause for each value
that the loop variable – loops
in this case – takes
on. The variable after in
is formally known as a
sequence. A sequence is nothing more than an ordered
collection of things; loops
takes on the value of each element
of this sequence in turn as the looping proceeds.
The easiest way to write a sequence is via a list
– an example
would be [0, 1, 2, 3]
. The elements of the list don’t need to
be integers; they can be any legal Python variables or constants,
including other lists. They don’t even have to all be of the same
type. Alternate sequences are strings: 'abcdefg'
; or a tuple:
(0, 1, 2, 3)
. A tuple is just like a list, but it cannot be
changed in the way that a list can – i. e., it is immutable – think
of it as a list constant. A string is just an immutable sequence of
characters.
The sequence in the above example is the function range, which generates a list of integers. For example,
range(3) -> [0, 1, 2],
while
range(1,3) -> [1, 2],
and
range(1,6,2) -> [1, 3, 5].
Thus, our for
statement sets
loops
successively to elements of the list
range(maxloops) -> [0, 1, 2, ..., maxloops - 1]
In Python 3, unlike in Python 2, the range function does not print out a list as indicated above. Instead, it is much more like the xrange function in Python 2, which generates sequencing on the fly. Xrange itself doesn’t exist in Python 3.
The test for convergence of the iteration in the above example is more
sophisticated than in the previous case; not only does x - xold
have to be sufficiently small, the function evaluated at x
must
be close to zero as well. When this occurs, the loop terminates early
courtesy of the break
statement. The else
clause is
only executed if the for loop terminates without executing a break
statement. Here we use it to alert the user that convergence was not
attained. The break
and else
statements can be used
analogously with the while
loop also. The else
is
optional in both cases.
So, how does our new Newton’s method solver work? Below it is applied
to two functions, f(x) = x2 − 4 and f(x) = x2 + 4. The former
has real roots, which newton
can find. However the roots of
the latter are complex, and hence are not found. (Newton’s method can
be made to work for complex analytic functions as well. Try invoking
newton
with an initial x = (1 + 3j)
!)
>>> from newton2 import * >>> def func(x): ... return x**2 - 4. ... >>> newton(3,func) loop = 0 -- x = 2.18032786885 -- f(x) = 0.753829615695 loop = 1 -- x = 2.01133262241 -- f(x) = 0.045458917965 loop = 2 -- x = 2.0003060376 -- f(x) = 0.00122424405357 loop = 3 -- x = 2.00000748606 -- f(x) = 2.99442871095e-05 loop = 4 -- x = 2.0000001826 -- f(x) = 7.30399140281e-07 >>> >>> def func2(x): ... return x**2 + 4. ... >>> newton(3,func2) loop = 0 -- x = 0.868852459016 -- f(x) = 4.75490459554 loop = 1 -- x = -1.7185621737 -- f(x) = 6.95345594488 loop = 2 -- x = 0.365104846463 -- f(x) = 4.13330154891 loop = 3 -- x = -4.61351872796 -- f(x) = 25.2845550532 loop = 4 -- x = -1.84322714371 -- f(x) = 7.39748630332 loop = 5 -- x = 0.21939117245 -- f(x) = 4.04813248655 loop = 6 -- x = -7.29409275083 -- f(x) = 57.2037890577 loop = 7 -- x = -3.34578679829 -- f(x) = 15.1942892996 loop = 8 -- x = -1.0406787574 -- f(x) = 5.0830122761 loop = 9 -- x = 1.52474027382 -- f(x) = 6.3248329026 no solution! >>>
Two statements change execution flow in Python loops, the break and continue statements:
>>> for i in range(5): ... if i == 3: ... break ... print (i) ... 0 1 2 >>> for i in range(5): ... if i == 3: ... continue ... print (i) ... 0 1 2 4 >>>
Note that the break
statement throws the program out of the
innermost loop when it is encountered, as illustrated in the previous
sub-section. The continue
statement interrupts the current
iteration of the loop and goes on to the next. The pass
statement does nothing at all! The purpose of this last statement is
solely as a placeholder.
Many of Python’s normal applications involve operations on text. Such text operations are typically not used much in numerical programming, so we will only touch lightly on this subject. The online documentation for Python is particularly complete and extensive for use of strings, lists, tuples, and dictionaries in text operations, so further information is readily available. Nevertheless, sometimes these language elements come in handy in scientific work, so we will spend some time on them.
First some definitions:
x = 'abcde'
.
Strings cannot be changed – they are immutable.
x = [1, 2.0, 'three']
.
The elements of lists can be changed – lists are mutable.
x = {'dog': 'animal', 'cat' : 2.1, 'rose' : (1+2j)}
. Indexing, which like the C language, starts at zero, allows one to extract individual elements or subsets of sequences. Demonstrating this on strings is easier than explaining it:
>>> x = 'abcdefg' >>> x[0] 'a' >>> x[1] 'b' >>> x[-1] 'g' >>> x[-2] 'f' >>> x[0:2] 'ab' >>> x[:] 'abcdefg' >>> x[0:] 'abcdefg' >>> x[:-1] 'abcdef' >>> x[0:6:2] 'ace' >>>
The colon-based notation returns a slice of the initial
sequence, or a sequence of elements starting with that indexed by the
number before the colon and ending with the that indexed by the number
after the colon minus one. The negative index notation is a
clever way of indexing back from the end of the sequence. The
optional third index is the stride – for instance, the
2
tells Python to jump 2 elements per step in marching
through the sequence rather than the default 1.
One slight difference between indexing of lists and strings is that a simple string index (i. e., not a slice) returns a string consisting of a single character, whereas a simple index of a list returns the corresponding element of the list, not a list consisting of a single element. However, a slice referencing a single list element returns a list consisting of that single element:
>>> y = [0, 1, 2, 3] >>> y[0:2] [0, 1] >>> y[0] 0 >>> y[0:1] [0] >>>
Tuples work just like lists except that the notation for a tuple
consisting of just one element is (for example) (5,)
, not
(5)
. The latter is just an integer enclosed in parentheses.
Python is an object oriented language, so data elements contain not only data, but useful methods which perform common methods of data manipulation (see the next section). For instance:
>>> x = 'I found a dog.' >>> x.find('dog') 10 >>> x.find('frog') -1 >>> x.split(' ') ['I', 'found', 'a', 'dog.'] >>> x.partition('found') ('I ', 'found', ' a dog.') >>> x.replace('dog','cat') 'I found a cat.' >>> x = 'I+found+a+dog' >>> x.split('+') ['I', 'found', 'a', 'dog'] >>> x = 'dog found' >>> x.split() ['dog', 'found'] >>> x.split(' ') ['dog', '', '', '', '', 'found'] >>>
The split method called with no arguments splits strings into elements separated by any amount of white space, e. g., spaces, tabs, newlines, etc.
There are many more methods for strings; these are just some of the most useful.
Python lists have some methods specific to them. Some of the most useful are illustrated below.
>>> x = [1, 2, 3, 4, 5] >>> x.append(6) >>> x [1, 2, 3, 4, 5, 6] >>> x.insert(2,2.5) >>> x [1, 2, 2.5, 3, 4, 5, 6] >>> x.remove(2.5) >>> x [1, 2, 3, 4, 5, 6] >>> y = x.pop() >>> y 6 >>> x [1, 2, 3, 4, 5] >>> x.reverse() >>> x [5, 4, 3, 2, 1] >>> x.sort() >>> x [1, 2, 3, 4, 5] >>>
The addition +
and multiplication *
operators have
straightforward meanings with sequences:
>>> 'I found ' + 'a dog.' 'I found a dog.' >>> 'dog '*3 'dog dog dog ' >>> [1, 2, 3] + [4, 5] [1, 2, 3, 4, 5] >>> 2*[1, 2, 3] [1, 2, 3, 1, 2, 3] >>>
The %
operator is more complex, but quite useful for
incorporating numbers and other strings into strings in a specified
format:
>>> a = 'I found %d dogs.' % (3) >>> a 'I found 3 dogs.' >>> 'The %s weighed %f grams.' % ('mouse', 43.5) 'The mouse weighed 43.500000 grams.' >>> 'The %s weighed %e grams.' % ('mouse', 43.5) 'The mouse weighed 4.350000e+01 grams.' >>> 'The %s weighed %.1f grams.' % ('mouse', 43.5) 'The mouse weighed 43.5 grams.' >>> 'The speed of light is %.2e m/s.' % (3e8) 'The speed of light is 3.00e+08 m/s.' >>>
The string contains C-like format statements, each preceeded by a
percent sign %
. A percent sign %
also separates this
string from a tuple, the elements of which can be strings or numbers, but
must match the corresponding types in the initial string. The first
example above reminds us that the results of a string operation can be
assigned to a variable, which can then be printed (implicitly or
explicitly).
Finally, it is easy to find the length of sequences with the function
len()
:
>>> z = 'abcde' >>> len(z) 5 >>> x = [0, 2, 4, 6] >>> len(x) 4 >>>
Likewise, it is easy to determine whether a particular element exists
in a sequence using the in
operator:
>>> x = [1,2,3,4,5,6] >>> 1 in x True >>> 1.5 in x False >>> 'a' in 'abc' True >>> 'x' in 'abc' False >>>
Logical expressions involving the in
operator can be used
in if
and while
statements to control the flow of
the program.
The dictionary is a Python type consisting of a collection of
key:value pairs. Its main use is to construct a kind of data
base. For instance, the dictionary called phones
below
contains peoples’ names (the keys) associated with their phone numbers
(the values). Dictionaries are displayed with surrounding curly
braces and the key:value pairs are separated by commas.
>>> phones = {'george': '835-4427', 'sandy': '838-1192'} >>> phones['george'] '835-4427' >>> phones['harry'] = '892-9553' >>> phones['harry'] '892-9553' >>> phones {'george': '835-4427', 'sandy': '838-1192', 'harry': '892-9553'} >>> phones.keys() ['george', 'sandy', 'harry'] >>> del phones['sandy'] >>> phones {'george': '835-4427', 'harry': '892-9553'} >>> 'harry' in phones True >>>
As shown, indexing a dictionary by a key returns the value as long as
the specified key exists in the dictionary. New entries in the
dictionary may be made by assigning the value of the new key:value
pair to the dictionary indexed by the new key. The del
command
removes the indexed key, as shown above. The key
method for
dictionaries illustrated above returns a list of the keys in the
dictionary. Finally, the in
operator can be used to see if a
dictionary has a particular key.
Keys in dictionaries must be immutable, and are most commonly strings. Keys cannot appear more than once. The value can be any python data type, and different keys can have values of different types associated with them.
Dictionaries are not sequences, as the key:element combinations are
not guaranteed to occur in any particular order. Thus, dictionaries
cannot be used directly for iteration in for
loops. However,
it is possible to loop over the elements of a dictionary, returning
each key and value successively, using the dictionary’s
iteritems
method, which effectively converts the dictionary
into a list of the dictionary’s entries. Using the phones
dictionary defined above:
>>> for k, v in phones.iteritems(): ... print (k, v) ... george 835-4427 harry 892-9553 >>>
Assignments involving lists and dictionaries can have some possibly unexpected results. Consider the following:
>>> a = ["a", "b", "c"] >>> b = a >>> c = a[:] >>> a ['a', 'b', 'c'] >>> b ['a', 'b', 'c'] >>> c ['a', 'b', 'c'] >>> a.append("d") >>> a ['a', 'b', 'c', 'd'] >>> b ['a', 'b', 'c', 'd'] >>> c ['a', 'b', 'c'] >>>
The assignment b = a
just makes the variable b
an alias
of the variable a
, so that when a
is changed by the
append
method, b
changes correspondingly. However, the
assignment c = a[:]
constructs a new list c
by
extracting the contents of list a
element by element. Thus,
when a
changes, c
does not. Any assignment other than a
simple “a = b
” causes this data transfer to happen, creating
a new object instead of an alias to the old object, even if the net
effect is to make the new object identical to the old object, as
occurs with c = a[:]
.
You may never write an object oriented program in Python as a scientific programmer. Nevertheless, it is important to have a basic understanding of how object oriented programming works in this language, since the construct is used so frequently.
The basic notion in object oriented programming is the class.
Think of a class as a blueprint of a data structure and methods
for operating on this structure. For instance, consider the following
module named workstuff.py
which contains the class definition
worker_pay
:
class worker_pay: version = "1.2.4" def __init__(self): self.data = {} def add_worker(self,name,pay): self.data[name] = pay def list_worker(self,name): return self.data[name] def list_all(self): return self.data.keys()
The def
s inside the class definition define the class methods,
which are really just functions.
Consider now the sequence of statements shown below:
>>> import workstuff >>> x = workstuff.worker_pay() >>> x.add_worker("george", 3000) >>> x.add_worker("frank", 2500) >>> x.list_all() ['frank', 'george'] >>> x.list_worker("george") 3000 >>> a = x.list_worker("george") >>> a 3000 >>> x.data {'frank': 2500, 'george': 3000} >>>
The first statement simply imports the module containing the class
definition. The next statement x = workstuff.worker_pay()
creates an instance x
of the class worker_pay
.
This instance contains information which is manipulated by the four
methods defined in the class. The first method, __init__(self)
is a specially named method which is invoked automatically when an
instance of the class is created. In this case it assigns an empty
dictionary to the variable data
contained in the instance,
though in fact it could do any number of things. The other methods
respectively add data (worker names and their salaries) to the
dictionary, return particular worker’s salaries, and return the names
of all workers. The returned values can be assigned to other
variables.
Notice that the first argument in the definitions of the methods,
self
, is omitted in the method invocations. Self
in the
class definition lets Python know that the method for the object
x
should be applied to the class instance x
itself.
There is nothing magic about the name “self”; it is just convention
to use this terminology and it could be any other name defined as the
first argument of a method and used consistently in the method body to
indicate the invoked object.
The variable data
, which is defined in the invocation of the
instance x
, can be accessed from the outside via the statement
x.data
. Normally it is considered bad form to take advantage
of this in object oriented programming; all access to the data in an
object should normally be indirect via methods. Python depends on the
“honor system” to enforce this programming code of conduct!
One other type of information in a class may be accessed; variables
defined inside the class definition but outside of any method. Note
the assignment statement version = '1.2.4'
in our class. The
variable is called an attribute of the class. Its value may be
obtained using notation similar to that of a method invocation, but
without the (possibly empty) argument list, as illustrated below:
>>> x = workstuff.worker_pay() >>> x.version '1.2.4' >>> workstuff.worker_pay.version '1.2.4' >>>
Attributes are used represent constant values specified in the class, in this case the version of the class. The attribute may be obtained either from the instance or from the class definition itself, as the second example illustrates above.
Multiple instances of our class can be invoked:
>>> y = workstuff.worker_pay() >>> y.add_worker("mary", 5000) >>> x.list_all() ['frank', 'george'] >>> y.list_all() ['mary'] >>>
The instance y
shares methods with x
, but the data are
completely independent of each other. Thus, one could use this single
class to keep track of employees and salaries in different departments
by invoking multiple instances.
Classes in Python can inherit methods from other classes. For
instance, a class named matrix
may be a two-dimensional
subclass of a more general class called array
. Assuming
array
has been previously defined, matrix
would inherit
all the methods defined in array
. The sub-class matrix
could then define additional methods specific to this class, e. g.,
matrix multiplication or computation of eigenvalues, in addition to
taking advantage of methods defined in the super-class. Methods in
the super-class could even be replaced by method definitions in the
sub-class with the same name if appropriate. The way inheritance is
implimented is to define the sub-class with a reference to the
super-class as follows:
>>> def subclass(superclass): >>> ...
The super-class may itself be a sub-class of a super-super-class, etc.
Nothing in object-oriented programming cannot be done with, say, functions. However, it does have the advantage of keeping structured data and the methods used to operate on the data together. You can choose whether to use it or not in Python.
We learned a bit about the readline()
and print
statements earlier. We now expand a bit on this.
In order to read or write from a file, we first need to open it.
When we are finished using it, we need to close the file, especially
if we have written to it. To open a file for writing, we use the
open
command as follow:
>>> f = open("testfile.txt", "w") >>> f.write("This is a test string\n") >>> f.close() >>>
The first argument of the open
command is the name of the file
to which we are writing and the second string, ''w''
tells
Python to create a new, empty file with the specified file name, which
is ready to be written to. The open
command returns a
file object which has several methods associated with it. The
method write()
writes the variable enclosed in the parentheses
to the to the file in question. We can invoke the write
methods as many times as we want, before closing the file with the
close()
method. The file name string can actually contain a
full path such as, for instance,
''/home/raymond/testfile.txt''
.
Note that the special character represented by the \n
is a
“newline” character, actually an ASCII linefeed. This character is
used to terminate lines in Unix and Linux systems. Windows and
Macintosh systems each use different characters for this, which is one
of the reasons why interchanging text between these systems and
Unix/Linux can be so annoying.
Let’s see how we did; we attempt to read the file that we wrote:
>>> g = open("testfile.txt", "r") >>> x = g.read() >>> x 'This is a test string\n' >>> g.close() >>>
Success! The file we wrote was where we intended to put it (the
current working directory) and had the desired content, a fact that
can be verified independently with a text editor. In this case we
opened the existing file for read-only access with the ''r''
argument in the open
statement. Unlike the readline()
method we discussed earlier, which reads a single line of text, the
read()
with no argument reads the entire file, transferring it
to the variable x
in the above script. What does a numerical
argument do? Try the above code with x = g.read(7)
and find
out!
The above read()
and write()
methods can actually read
and write data in arbitrary binary form, not just text strings.
However, beware – on Macs and Windows machines, some extra fiddling
has to be done to be sure binary data are not corrupted.
The sys
module pre-defines three file objects which are always
open and need not be closed, stdin
, stdout
, and
stderr
. These correspond to the standard Unix/Linux
input-output streams of the same name; stdin
reads typed input
from the keyboard and both stdout
and stderr
write
output to the terminal. The latter is by custom reserved for error
messages which often need to be kept separate from the main
stdout
data stream. These data streams can be redirected to
access files in the usual Unix/Linux fashion.
Here are some examples of the use of these data streams:
>>> import sys >>> sys.stderr.write("ugh!") ugh!>>> >>> sys.stdout.write("gerbils\n") gerbils >>> x = sys.stdin.readline() This is a line to standard input. >>> x 'This is a line to standard input.\n' >>>
Note the role of the “\n
” in these examples.
The Python home page http://www.python.org/ is an excellent source of documentation for standard Python and Python libraries. Look particularly at the Tutorial http://docs.python.org/tutorial/ and the Library Reference http://docs.python.org/library/.
Remember that this document refers to Python 3.